## Abstract

The closed-loop operation of brain-machine interfaces (BMI) provides a framework to study the mechanisms behind neural control through a restricted output channel, with emerging clinical applications to stroke, degenerative disease, and trauma. Despite significant empirically driven improvements in closed-loop BMI systems, a fundamental, experimentally validated theory of closed-loop BMI operation is lacking. Here we propose a compact model based on stochastic optimal control to describe the brain in skillfully operating canonical decoding algorithms. The model produces goal-directed BMI movements with sensory feedback and intrinsically noisy neural output signals. Various experimentally validated phenomena emerge naturally from this model, including performance deterioration with bin width, compensation of biased decoders, and shifts in tuning curves between arm control and BMI control.

Analysis of the model provides insight into possible mechanisms underlying these behaviors, with testable predictions. Spike binning may erode performance in part from intrinsic control-dependent constraints, regardless of decoding accuracy. In compensating decoder bias, the brain may incur an energetic cost associated with action potential production. Tuning curve shifts, seen after the mastery of a BMI-based skill, may reflect the brain's implementation of a new closed-loop control policy. The direction and magnitude of tuning curve shifts may be altered by decoder structure, ensemble size, and the costs of closed-loop control. Looking forward, the model provides a framework for the design and simulated testing of an emerging class of BMI algorithms that seek to directly exploit the presence of a human in the loop.

## 1. Introduction

The closed-loop operation of brain-machine interfaces (BMI) provides a framework to study the mechanisms behind neural control of behavior through a severely restricted output channel. In the basic prototype, measurements of the subject's neural signals are translated by a BMI algorithm into movements of a computer cursor (or other device) that the subject attempts to control, bypassing the need for overt movement. As such, BMI is a therapeutic possibility for severe paralysis from stroke, trauma, and degenerative disease. Despite significant ongoing improvements in the design of closed-loop BMI systems, a fundamental, empirically validated theory of closed-loop BMI operation is lacking.

Although the BMI concept has its roots in operant conditioning (Fetz, 1969), present-day BMI algorithms have built on open-loop decoding, where natural arm movements are reconstructed from concurrently measured neural activity (Brockwell, Rojas, & Kass, 2004; Eden, Frank, Barbieri, Solo, & Brown, 2004; Georgopoulos, Schwartz, & Kettner, 1986; Wu, Gao, Bienenstock, Donoghue, & Black, 2006). Subsequently, these same algorithms were implemented in closed-loop decoding (Carmena et al., 2003; Serruya, Hatsopoulos, Paninski, Fellows, & Donoghue, 2002; Taylor, Tillery, & Schwartz, 2002). This means that the subject was allowed to view the output of the algorithm in real time, represented in the movement of a cursor or robotic arm. In many demonstrations, subjects have been able to control closed-loop cursor behavior without making overt arm movements.

In various experiments with BMI, closed-loop decoding significantly outperforms open-loop decoding. At least three additional phenomena of closed-loop BMI control are rigorously described in the literature. First, closed-loop control deteriorates with increasing bin width in spike-based BMI (Cunningham et al., 2011). Second, closed-loop decoding is capable of compensating for BMI algorithms that demonstrate systematic bias in open-loop decoding (Chase, Schwartz, & Kass, 2009; Jarosiewicz et al., 2008; Koyama et al., 2010). Third, neural ensembles in motor cortex demonstrate that preferred direction of velocity tuning curves shifts between arm control and closed-loop BMI control (Carmena et al., 2003; Ganguly & Carmena, 2009; Ganguly, Dimitrov, Wallis, & Carmena, 2011; Jarosiewicz et al., 2008; Taylor et al., 2002), as well as between closed-loop operation of different BMI algorithms (Ganguly et al., 2011).

While it is generally believed that feedback to the subject explains dramatic improvements in closed- versus open-loop BMI performance, no comprehensive model has been developed to validate this hypothesis. Such a model would need to generalize beyond closed-loop performance improvements to simultaneously explain various other coexisting phenomena: deterioration with increasing bin width (Cunningham et al., 2011), compensation of decoder bias (Chase et al., 2009; Jarosiewicz et al., 2008; Koyama et al., 2010), and shifting tuning curves (Carmena et al., 2003; Ganguly & Carmena, 2009; Ganguly et al., 2011; Jarosiewicz et al., 2008; Taylor et al., 2002). Here, we propose to explain these various phenomena of BMI operation by describing the brain in closed-loop mode as a stochastic optimal controller. This class of models has been extensively explored in the description of natural motor coordination (Diedrichsen, Shadmehr, & Ivry, 2009; Scott, 2004; Todorov, 2004). In related work, suboptimal control based on iterative parameter optimization was introduced in a previous modeling study of learning in BMI (Heliot, Ganguly, Jimenez, & Carmena, 2010).

In our BMI model framework (see Figure 1A),
stochastic optimal control is implemented by a neural control network that attempts
to steer neural activity (*n _{k}*) in primary motor cortex using a representation of velocity
(

*u*) with the goal of directing the observed cursor (

_{k}*x*) to desired targets. In effect, the neural control network steers a composite system (see Figure 1A, enclosed by the dashed outline) that includes primary motor cortex and the neuroprosthetic device. By analogy to the motor control literature, the principle of optimality represents performance when learning is complete rather than describing intermediate performance. The resulting stochastic dynamical systems model demonstrates a richness of behavior sufficient to reproduce differences in closed- versus open-loop BMI performance, while simultaneously explaining key phenomena of closed-loop BMI operation through a concise mathematical framework.

_{k}## 2. Methods

### 2.1. Models for the Neuroprosthetic Device.

We explored models for the neuroprosthetic device corresponding to three frequently used BMI algorithms: Kalman filter (KF), optimal linear estimator (OLE), and population vector algorithm (PVA). Each neuroprosthetic device model receives spike times from an ensemble of simulated primary motor cortical neurons as input and updates the position and velocity of the simulated cursor as output. Equations for the implementation of these various filters are provided in appendix B. Parameters of the KF, PVA, and OLE were fit using maximum likelihood estimation.

In our analysis, the neuroprosthetic device is a system, defined by a state and
dynamics. The state *x _{k}* is a 5 × 1 vector containing cursor position, velocity, and the
number 1 (used to implement affine dynamics). The dynamics are governed by the
corresponding BMI filter equations, which specify the future state

*x*

_{k+1}in terms of the previous state

*x*and neural signals

_{k}*n*. The specific mathematical form of these dynamics can simplify or complicate the derivation of a neural control network model. In particular, we desire linear dynamics governed by parameters that are invariant with time (called LTI systems).

_{k}For the OLE and PVA, the corresponding neuroprosthetic device is LTI. For the KF, parameters change with the Kalman gain. As a result, the standard KF is linear but not time invariant. Under typical conditions, the Kalman gain has an asymptotic value that is independent of the neural signal input. By initializing Kalman gain to this steady-state value, the KF-based device becomes LTI. This variation on the KF is called the steady-state KF, which substantially simplifies derivation of the neural control network model. As an implementation detail on the KF case, note that the neural control network assumes the KF-based neuroprosthetic device is a steady-state KF, while the simulated device itself executes the standard KF.

### 2.2. Model for Primary Motor Cortex.

*b*be the preferred vector and

_{i}*c*be the minimum firing rate of neuron

_{i}*i*. The mean firing rate of this neuron is then

*i*th bin of width seconds is distributed as In our model, we choose

*c*= 10 spikes/s and with uniformly drawn on [0, 2]. This resulted in uniformly random preferred directions, minimum firing rate of 10 spikes/s, and a firing rate of 24 spikes/s for intended movements at 20 cm/s along the preferred direction, as used previously (Srinivasan, Eden, Willsky, & Brown, 2006). As a technical aside, substantially higher firing rates have been used in conjunction with this model in a three-dimensional simulated environment (Cunningham et al., 2011). While our preliminary work initially employed those higher firing rates, we found decoding accuracy to be unrealistically high in our two-dimensional work space.

_{i}### 2.3. Model for Neural Control Network.

Our next challenge was to represent the brain in closed-loop operation of the
neuroprosthetic device. Experimental results have suggested that feedback may
explain the diversity of phenomena in closed-loop BMI operation. The stochastic
optimal control framework was a natural choice to examine this specific
hypothesis. This controller incorporates feedback on the neuroprosthetic cursor
position and velocity (*x _{k}*) to steer neural activity recorded from motor cortex by determining
the intended cursor velocity (

*u*) using a rule (or function) called a control policy. The control policy embodies the input-output relationship of the neural control network (see the system diagram in Figure 1). Equations for the derivation and implementation of this control policy are provided in appendix C. Below, we provide a brief introduction to the neural control network with a focus on the high-level concepts.

_{k}*x*) and intended velocity (

_{k}*u*) to encourage spike patterns that result in goal-directed cursor movement that stops at the origin:

_{k}*Q*and

*R*are square matrices that determine the relative costs associated with state and control, respectively. Choice of

*Q*results in reaching movements. Choice of

*R*produces realistic neuronal firing rates. In our specific example,

*x*is a 5 × 1 vector that contains position and velocity in two dimensions at time step

_{k}*k*, as well as the constant 1 in the fifth row, to allow for affine control policies. Also,

*u*is a 2 × 1 vector containing the intended 2D velocity at time step

_{k}*k*. We employed basic choices of

*Q*and

*R*to match these dimensions: where , , and are scalars. The relative weighting of these two costs (

*Q*and

*R*) was tuned by hand in order to achieve realistic reaching movements and firing rates. When the

*R*was too low, neural firing rates were unrealistically high. When Q was too low, realistic reaching movements were not produced. Nevertheless, model behavior was preserved under a wide range of parameters, as described in appendix E.

*w*): Here,

_{k}*A*and

*B*represent the entire plant, including the combined effect of output neurons and the neuroprosthetic device. For our example,

*A*is 5 × 5 (to match the dimensions of

*x*), and B is 5 × 2 (because

_{k}*u*is 2 × 1). This formulation greatly simplifies the model implementation and analysis, bringing it into the realm of established theory on linear quadratic regulators (Bertsekas, 2005).

_{k}*u*that optimizes this cost turns out to be a linear, time-invariant function of the state

_{k}*x*(Bertsekas, 2005): In the full closed-loop simulation, this control policy, equation 2.7, is coupled with the actual nonguassian neural signal model and true (not necessarily time-invariant) neuroprosthetic device equations (described in the previous two sections) to simulate closed-loop BMI-controlled cursor movements toward a target in a two-dimensional work space.

_{k}Four caveats and implementation details are helpful regarding this abbreviated description of the neural control network, more fully described in the appendix. First, note that the use of optimal control results in a neural control network model that represents performance when learning is complete. In fact, our controllers are only approximately optimal, because our simulated primary motor cortical neurons emit point-process signals that the controller approximates as gaussian. We also briefly examine the effect of exact gaussian neural activity in Figure 3B (which more reasonably approximates local and distant field potentials rather than spikes), where our controller is exactly optimal.

Second, while feedback to the user in standard experiments is available in time steps determined by the screen refresh rate, the various BMI algorithms (KF, OLE, PVA) update at intervals determined by the bin width, which is substantially longer. To capture this constraint in the model, the controller was afforded the capability to project state dynamics over 5 ms steps, which was at a finer resolution than the smallest decoding bin width of 25 ms. While allowing for a more realistic model, accounting for feedback at finer temporal scales than the decoding bin width did not change the stated trends. The appendix describes our solution to this problem in full.

Third, our model also includes a nonzero reaction time of 200 ms. Although
the choice of reaction time had no bearing on the major trends, a nonzero
reaction time was needed to match the general magnitude of mean integrated
distance to target (MID) seen in experimental data. As would be expected from
the definition of MID, finite reaction time increases the minimum achievable MID
for any condition, which is the *y*-intercept of the MID versus
bin width trend.

Fourth, although our model does not assert a particular anatomical location for the neural control network, experimental data have demonstrated changes in patterns of activity from neurons that are not directly connected to recording electrodes (Ganguly et al., 2011). In fact, the few neurons involved in directly modulating the neuroprosthetic device are connected indirectly to a network of millions of neurons that could subserve this role. Our neural control network model implements feedback control with a level of abstraction that allows clarity in analysis while sacrificing the cellular resolution provided by a large-scale biophysical model.

### 2.4. Parameter Selection and Sensitivity.

Parameters of our model were held constant across the various analyses rather than being individually optimized for each separate analysis. Parameters of the neuroprosthetic device and primary motor cortex were fully determined by the neural signal model, which in turn is determined from the literature (Cunningham et al., 2011; Moran & Schwartz, 1999; Srinivasan et al., 2006; Truccolo et al., 2005). Remaining parameters included (, , within the cost function of the neural control network and a deterministic reaction time parameter. Equations 2.3 to 2.5 define the cost function in terms of these parameters. We fixed , 1.8, and reaction time = 200 ms for simplicity, reducing our model to a total of 1 free parameter, namely, .

The and parameters determine the importance of achieving the target
location at zero velocity. The parameter indirectly determines the importance of energy
dissipated from spiking activity by penalizing intended velocity
(*u _{k}*). Balance between these parameters is needed to produce both
goal-directed reaching movements and realistic firing rates. As described in
appendix E, the trends reported in this
study hold over four orders of magnitude in .

The deterministic reaction time parameter modulated the *y*-intercept offset of MID score in Figure 2B. We found it necessary to introduce reaction time
(set to 200 ms in this analysis) to match the magnitude of MID values
reported from experimental data. However, the choice of this parameter had no
bearing on the relationship of closed-loop performance with bin width, the
relevant trend in this analysis.

### 2.5. Reach Task.

As a basis for comparison against experimental data, we employed a standard center-out reach task reported in a recent study relating closed-loop performance and bin width (Cunningham et al., 2011). Rather than initiating center-out movements, we equivalently required out-to-center movements for mathematical simplicity, without loss of generality. Starting points were drawn uniformly on a circle of radius 8 cm. The target zone was a square measuring 4 cm in width, centered on the origin. Successful target acquisition required a hold period, where the cursor was to be held within the target zone for at least 500 ms. Each trial was terminated after target acquisition, or at 3 seconds, whichever came first. These conditions were identical to the corresponding experimental setup (Cunningham et al., 2011) except that our work space was 2D rather than 3D.

## 3. Results

We studied the behavior of our model for closed-loop BMI operation using 96 simulated neurons in a target acquisition task designed to reproduce standard testing conditions (Cunningham et al., 2011). Details of the model implementation and task were provided in section 2. Here, we report the results of our analysis.

### 3.1. Open-Loop and Closed-Loop Performance Versus Bin Width.

As a preliminary analysis, we compared single trials of open-loop versus closed-loop performance in the reach task using a Kalman filter (KF) model of the neuroprosthetic device (see section 2). Figure 1B demonstrates sample paths for two trials. Cursor movements begin on the circular periphery in an attempt to approach the origin within the target zone. Open-loop trajectories demonstrate a meandering path. In comparison, using the same population of neurons, closed-loop control affects more directed paths that acquire the target.

We then used mean integrated distance to target (MID) to quantify open- and closed-loop performance (Cunningham et al., 2011). MID is computed by averaging the sequence of distances measured from the cursor to the target over the duration of the trial. All trials were incorporated into the MID calculation, including those with unsuccessful target acquisition. In concordance with experimental data, our model demonstrates that closed-loop mode outperforms open-loop mode (see Figure 2B).

Subsequently, we systematically varied bin width to examine its effect on closed-loop performance in the reach task (see Figure 2A). Our results show deterioration in closed-loop MID with increasing bin width (see Figure 2B), as well increasing time to target (see Figure 3A) and target acquisition failure rate (see Figure 3B). These trends concur with the experimental data (Cunningham et al., 2011). There is also a similar trend in the open-loop performance, although first-order trends in the experimental data on open-loop performance versus bin width are ambiguous (Cunningham et al., 2011). Moreover, we were unable to reproduce the strongly quadratic-appearing trends demonstrated in some examples of open-loop MID versus bin width (Cunningham et al., 2011).

To understand why closed-loop performance depends on bin width, we examined two possible explanations, motivated by inspection of the sample trajectories (see Figure 2A). First, this dependence might represent an intrinsic property of closed-loop control, even in the absence of neural signal noise. Surprisingly, the zero-noise condition also exhibited a linear trend in MID versus bin width, almost identical to the original model (see Figure 3C), although with significantly decreased time to target (see Figure 3D). In plots of the cursor speed (see Figure 2C) with noise and (see Figure 2F) without noise, as well as distance to target versus time (see Figure 2D) with noise and (see Figure 2G) without noise, smaller bin widths allow faster initial acceleration in cursor velocity and smoother deceleration, resulting in a faster approach to target and lower MID. Speed and distance profiles reflect a 200 ms deterministic reaction time (see section 2). Note that neurons are permitted to transition instantly between various firing rates (see Figure 2E).

Second, closed-loop performance deterioration with increasing bin width may also reflect the integration of velocity decoding errors over longer bin widths. A countervailing force in this explanation is that larger bin widths might better conform to assumptions by the KF that spike counts are gaussian. To determine which effect dominated in this model, we performed open-loop decodes over the duration of the bin width, calculating root mean squared error in the final position at the end of the bin width. The resulting trend (see Figure 2H) demonstrates that longer bin width results in larger errors in cursor position at the end of the bin width period.

As a technical point, note that the cursor exhibits movement prior to the 200 ms reaction time (see Figure 2C). This is because before the reaction time, spiking at the baseline firing rate is decoded by the filter into a nonzero velocity. Moreover, this prereaction-time decoded velocity exceeds velocities achieved at the end of the trial. This occurs because at the end of the trial, the controller actively compensates to achieve zero velocity as the cursor gets closer to the goal, whereas this capability is not engaged prior to the reaction time. Nevertheless, high velocities in the first 200 ms reaction period do not manifest in the cursor position trajectory (see Figure 2D) because those velocities are undirected on average.

### 3.2. Decoder Bias in Open Versus Closed-Loop Behavior.

Experiments with BMI have also suggested that closed-loop control can partially compensate decoder bias, which is systematic error in the decoded velocity (Chase et al., 2009; Jarosiewicz et al., 2008; Koyama et al., 2010). To examine this possibility, we compared open-loop versus closed-loop behavior in the PVA (Georgopoulos et al., 1986), which is known to be a biased variation on the OLE (Chase et al., 2009). Error in the decoded angle of movement was averaged over multiple trials for each of eight movement directions. As a matter of convenience, this was achieved by averaging error in the first time step across 1000 trials, separately for each starting point. A plot of decoding bias versus movement direction (see Figure 4A) demonstrates that PVA biases were reproducibly reduced in closed-loop mode relative to open-loop mode. The OLE in contrast, demonstrated no significant change in decoder bias, which was lower than the PVA in either case.

We sought an explanation for how closed-loop control reduces decoder bias. The mathematical analysis (see appendix C) reveals that the controller is explicitly capable of adjusting intended velocities to compensate for decoder bias. However, the analysis also predicts that residual bias can persist in closed-loop mode due to two factors. First, nongaussian neural activity violates assumptions made by the controller, so that even the OLE demonstrates decoder bias (see Figure 4A). Second, cost associated with intended velocity (and consequently motor cortical neuron firing rates) should prevent the neural control network from perfectly compensating PVA bias.

To validate the predictions on persistence of bias made by this analysis, we reexamined decoder bias with neural activity described as a gaussian process rather than an inhomogeneous Poisson process, using the identical mean and variance in spike counts from the original model. Under these conditions (see Figure 4B), decoder bias in the OLE disappeared. In contrast, bias in the closed-loop PVA persisted, presumably reflecting the energetic cost of intended velocities needed to compensate decoder bias. To confirm this intuition, we removed cost associated with intended velocity under gaussian neural signals. This allowed the neural control network to bring closed-loop PVA bias to zero (see Figure 4C).

### 3.3. Tuning Curves in Arm Versus BMI Control.

Neural ensembles in motor cortex demonstrate velocity tuning curves that shift
preferred direction between arm control and closed-loop BMI control (Carmena et
al., 2003; Ganguly & Carmena, 2009; Ganguly et al., 2011; Jarosiewicz et al., 2008; Taylor et al., 2002) as well as between closed-loop operation of different BMI
algorithms (Ganguly et al., 2011). These
studies have suggested that tuning curve shifts represent skill learning through
neuronal adaptation to the BMI algorithm, although systematic shifts in
particular directions have not been consistently demonstrated across
publications. To understand precisely why tuning curve shifts manifest during
proficient BMI control, we compared tuning curves during stochastic optimal PVA
and OLE control with the original motor cortical tuning curves, which reflect
arm control. Directional tuning curves during BMI control were determined by
regressing externally observable intended cursor direction (given by target
direction relative to the current cursor position) against spiking activity with
the cosine-tuned model form used in Cunningham et al. (2011). It is important to note that externally
observable intended cursor direction may be different from the direction of *u _{k}*, the internal representation of intended velocity.

We studied preferred direction under BMI and arm control in polar coordinates under various conditions (see Figure 5). These plots represent typical tuning curve shifts, with a progression from large shifts (see Figure 5A) to imperceptible shifts (see Figure 5D). Our analysis reveals that shifts reflect a control policy within the brain that redirects neuronal activity to compensate decoder bias. As a result, decoders with progressively less bias demonstrate decreased tuning curve shift. For example, the PVA with 96 neurons (see Figure 5C) demonstrates less shift than the PVA with 10 neurons (see Figure 5B). Similarly, the OLE with 96 neurons (see Figure 5D) demonstrates less shift than the PVA with 96 neurons (see Figure 5C).

We also see that costs on control influence the magnitude of tuning curve shifts. Tuning curves without control costs (see Figure 5A) demonstrate larger shifts than equivalent tuning curves with control costs in effect (see Figure 5B). To understand this, recall (see Figure 4) that costs on control prevent the brain from entirely compensating decoder bias. When the brain only partially compensates decoder bias, its control strategy under arm and BMI control are more similar, resulting in smaller tuning curve shifts.

Finally, the direction of tuning curve shifts is determined by the inverse of the BMI function, which maps cursor movement onto firing rate. In the process of exerting closed-loop control, the brain's tuning curves shift toward the dominant eigenvector of this mapping (see Figures 5A and 5B). When there is no dominant eigenvector (such as the PVA constructed under large numbers of neurons), the direction of shift is less systematic. In this case, tuning curves are pulled toward a multiplicity of eigenvectors (see Figure 4D). Implications of these results are described below.

## 4. Discussion

To operate a BMI, the brain must solve the control of a system that differs from the arm in natural movement. This problem is made more challenging by a highly constrained channel for control, embodied by a few tens of output neurons rather than the millions that form the corticospinal tract in health. BMI movements represent the combined effect of a neural controller and the BMI algorithm. Understanding neuroprosthetic behavior requires a coherent theory of these two systems and their interaction.

In this analysis, we showed that stochastic optimal control provides a compact description of closed-loop BMI. Control theory is a natural choice to describe the brain in its capacity to adjust neural output in response to sensory feedback. The principle of optimality represents proficiency, demonstrating performance after learning. The stochastic component of this theory incorporates the intrinsic randomness present in neural output. Our model is challenged to work within the constraints of a real BMI system: a noisy neural output channel, biased decoders, and discrete bin width decoding algorithms. The resulting theory for closed-loop BMI posits explanations for why performance deteriorates with increasing bin width, how decoder bias diminishes under closed loop control, and why tuning curves differ between arm movement versus BMI operation. Below, we outline various testable predictions, limitations of the model, use of the model as a simulation platform for BMI algorithm development, and implications for BMI algorithm design.

### 4.1. Testable Predictions of BMI Based on the Stochastic Optimal Control Model.

Three testable predictions of closed-loop BMI operation emerge from the stochastic control model.

First, our model predicts that even BMI controlled with hundreds or thousands of neurons would suffer in performance with increasing bin width. This follows from our analysis that the dependence of closed-loop performance on bin width, studied rigorously in previous work (Cunningham et al., 2011), persists even when neural signal noise is completely removed. By examining speed and position versus time, we demonstrated that smaller bin widths allow the controller to achieve faster, more transient velocities, even in the absence of neuronal noise. Although smaller bin widths favored the Kalman filter in our testing regime, the degree of model mismatch may be a countervailing or positively reinforcing factor in performance with small bin width in any specific decoder. For example, prior work suggests that point-process models may actually decrease model mismatch in single-millisecond-scale bin widths versus a gaussian counterpart (Eden et al., 2004).

Second, although the brain compensates biased decoders in closed-loop experiments (Chase et al., 2009; Jarosiewicz et al., 2008; Koyama et al., 2010), the model predicts that this capability does not necessarily come for free (see Figure 4). In our example with nongaussian neural activity, discrepancies between the PVA and OLE in terms of bias in decoded movement direction are decreased in closed-loop mode in comparison with open-loop mode. Nevertheless, PVA bias compensation is not complete, even in the case of pure gaussian neural observations where the OLE has no residual direction bias. PVA bias is fully compensated by entirely removing cost associated with intended velocity. This suggests that in practice, fully compensating PVA bias results in increased cost associated with intended velocity. In the full spike-based closed-loop BMI, this cost corresponds to dissipating chemical energy in the production of action potentials at the motor cortical output layer. While the brain in closed loop may be capable of mitigating certain decoder deficiencies, the energetic demand of this compensation to the patient may be a necessary design consideration. Comparison of total spike output in fully trained BMI control with the OLE versus biased PVA is an initial test of this prediction that could be performed retrospectively.

Third, our model predicts decoder-dependent tuning curve shift. This means that tuning curve shifts are expected to depend on the extent of decoder bias in reconstructing intended movement directions (see Figure 5C versus 5D). Several experimental reports have demonstrated that tuning curves can shift between arm movement and BMI control (Carmena et al., 2003; Ganguly & Carmena, 2009; Ganguly et al., 2011; Jarosiewicz et al., 2008; Taylor et al., 2002), as well as between different BMI. Our analysis suggests that changes in output neuron tuning curves in the learned state reflect the closed-loop control policy implemented by the brain. In other words, this shift could be a necessary consequence of proficient BMI control that involves different plant dynamics from the arm and varies across different BMI. Decoder-dependent shift is a testable consequence of this conclusion.

As a specific example of decoder-dependent tuning curve shift, the model suggests that tuning curve shifts are expected to differ between PVA algorithms that use differing numbers of neurons. Our eigenvector analysis demonstrates that PVA tuning curves shift toward the dominant eigenvector of the inverse PVA mapping (see Figures 5A–5C). When more neurons with randomly chosen preferred directions are introduced into the PVA mapping (see Figure 5C), the dominant eigenvector is less influential, resulting in smaller shifts. In other words, if preferred directions are more uniformly distributed, no single eigenvector will dominate, resulting in smaller, more nonsystematic rotations in preferred direction.

As another example of decoder-dependent tuning curve shift, the model suggests that tuning curve shifts will defer between the PVA and OLE methods (see Figure 5). This difference could be explained by the fact that the OLE represents a nearly unbiased estimator, unlike the PVA with a limited number of neurons (see Figure 4). In an experimental validation of this prediction, we expect that the OLE will elicit more significant tuning curve shifts than illustrated here, because real neurons will violate the idealized neural signal model (see equation 2.1), introducing decoder bias into the OLE.

### 4.2. Limitations of the Stochastic Control Model in Describing BMI.

Here, we examine various limitations of stochastic control in describing closed-loop BMI operation.

Is the behavior of this model invariant to the choice of cost function? Our parameter sensitivity analysis demonstrates that experimental trends reproduced in Figures 1, 2, 4, and 5 are robust across multiple orders of magnitude in parameters of our quadratic cost function. This cost function shapes the behavior of our system by penalizing for action potential production, distance from cursor to target, and cursor velocity. These three components allow realistic firing rates, goal-directed movement, and cursor movements that stop at the target. However, our analysis does not assert uniqueness: other cost functions may result in similar behavior. Nevertheless, this model provides a starting point for subsequent work to investigate which costs are most important to the biological system in operating a BMI. This future direction will require additional technical development related to adaptive estimation of model parameters from empirical data, as well as model quality measures. Moreover, it is not immediately obvious that measurable experiment data will be sufficient to discriminate between meaningfully different cost functions; this problem may be ill posed, although this statement is speculative.

Although our model reproduces a linear closed-loop MID versus bin width trend reported previously (Cunningham et al., 2011), it does not reproduce the associated quadratic open-loop trend. Does this open-loop discrepancy reflect differences between data statistics in these two studies? The difference in the open-loop decoding curve can arise from one (or both) of two sources: the neural signal model and the class of movement trajectories that are open-loop decoded. Our present work and Cunningham et al. (2011) employ identical neural signal models. As a result, the source of this discrepancy can be attributed to differences in the open-loop trajectories. Trajectories reconstructed in open loop by the Cunningham model are three-dimensional arm movements of a human subject, which may have resulted in more complex paths than the simple, directed, two-dimensional point-to-point movements used in our open-loop decoding procedure. This would explain the onset of open-loop performance degradation in Cunningham et al. (2011) with smaller bin widths, which results in a quadratic trend.

This difference in open-loop MID curves does not propagate into differences in closed-loop MID curves because these curves are calculated differently, although they share the same neural signal model. The open-loop decoder operates on neural activity from a reach state equation, while the closed-loop decoder operates on neural activity from the simulated neural control network. Because the closed-loop and open-loop MID curves are calculated differently, it follows that one MID curve might reproduce (Cunningham et al., 2011), while the other would not. Our MID results suggest that the neural control network is a reasonable approximation to the closed-loop strategy of BMI subjects in Cunningham et al. (2011), while the reach state equation may be overly simple relative to the trajectories employed by Cunningham et al. (2011) in open-loop decoding.

Our model allows discontinuous changes in neuronal firing, resulting in unrealistic step-like profiles in firing rate (see Figure 2E). Would an additional cost on changes in neural output, aimed at smoothing these firing rate profiles, significantly alter the core behaviors of our closed-loop model? Although a comprehensive analysis is needed to definitively address this question, a preliminary examination suggests this might not be the case. In Figure 2, we showed that a linear increase in MID versus bin width resulted from the propagation of velocity decoding errors over longer time periods (see Figure 2H), as well as coarsened speed profiles even in the absence of decoding error (see Figure 2F). The additional cost of changes in neuronal output during closed-loop operation would have no bearing on Figure 2H, which represents a one-time-step open-loop decoding property. The additional cost would, however, result in more gradually rising speed profiles (see Figure 2F). Because of the open-loop decoding effect (see Figure 2H), the relative differences between various bin widths would likely persist with the dampened rise in speed. The combined result of dampened speed profiles and the open-loop decoding effect might actually potentiate the MID versus bin width trend. On a more speculative note, it may be the case that gradual changes in neuronal firing rates represent an emergent property of closed-loop behavior in specific experimental paradigms rather than intrinsic limitations of neural activity, which can demonstrate rapid firing rate changes relative to 25 to 300 ms bins, for example, in transition from stasis to movement (Richardson, Borghi, & Bizzi, 2012).

Although the optimality principle represents proficient BMI operation, the model does not explain the process and dynamics of learning to control the BMI. In subsequent work, connections between our framework and the adaptive control perspective with iteratively tuned parameters (Heliot et al., 2010) will provide a clearer understanding of the dynamics of BMI in learning. Additionally, our controller is provably optimal for gaussian observation processes (such as field potentials) but only adequate (and not proven optimal) for the inhomogeneous Poisson process model of motor cortical spiking. Related to this point, alternate control strategies may be needed to analyze point-process filters (Eden et al., 2004) and adaptive filters (Dangi, Gowda, Heliot, & Carmena, 2011; Eden et al., 2004) that adjust online to changing neural parameters. These filters are time varying (and in some cases, nonlinear), where our neural control network was constructed for linear and time-invariant cases: the steady-state KF, OLE, and PVA. Uncertainty in neural representations of sensory feedback is another unexplored theme in the theoretical analysis of BMI, where new hardware techniques are emerging for the direct patterning of neural activity (Wang et al., 2012).

Could our model explain differential changes in preferred direction among perturbed and nonperturbed units as described in Jarosiewicz et al. (2008)? It is unlikely that the model in its current form, with the neural control network as a single monolithic feedback controller, could accommodate this more complex behavior. Reproducing this additional data might require invoking the brain's capability for implementing multiple processes that separately control subpopulations of output motor neurons, although this statement is speculative. In future work, our model formulation could be explicitly refined against this additional experimentally documented behavior, such as by comparison to the iterative parameter tuning model for BMI learning reported previously (Heliot et al., 2010), that appears to have been successful in this regard. Although our analysis predicts that shifting occurs to a variable extent across neurons based on distance between the preferred direction and inverse mapping eigenvector (see Figure 4), this emergent phenomenon of the model may be unrelated to the differential changes in preferred direction documented in Jarosiewicz et al. (2008) between perturbed and nonperturbed neurons. This question is unresolved in this letter and requires additional examination. Additional perspective on shifting tuning curves may be obtained by empirically documenting the effects of metabolic and attentional states on neuronal activity at the output layer.

### 4.3. Simulation-Based BMI Development with the Stochastic Control Model.

The stochastic control model may assist in future BMI algorithm development. Simulation of closed-loop performance is a beneficial complement to the experimental pipeline for BMI validation because it offers a fast path for testing and debugging. Simulation is similarly used in other fields where empirical validation is costly, including aeronautics, bridge construction, and communication systems. To date, there has been limited use of simulation in BMI development, such as with open-loop decoding using point-process neural signal models (Kemere & Meng, 2005; Shanechi, Wornell, Williams, & Brown, 2010; Srinivasan et al., 2006). This may be in part because open-loop simulation does not embody the human-in-the-loop aspect of closed-loop BMI operation. Recent work demonstrated that closed-loop operation can be simulated by tracking overt arm movements of a healthy human in the loop (Cunningham et al., 2011), as well as by iteratively tuned control through gradient descent (Heliot et al., 2010). Our work expands the path toward computer-based simulation of proficient closed-loop control, which may complement the use of iteratively tuned control to simulate the BMI learning process. The next phase of development in BMI algorithms is beginning to consider closed-loop effects (Dangi et al., 2011; Gage, Ludwig, Otto, Ionides, & Kipke, 2005; Gilja et al., 2010; Golub, Yu, Schwartz, & Chase, 2011; Jarzebowski, Srinivasan, & Coleman, 2008; Ma, Aghasadeghi, Jarzebowski, Bretl, & Coleman, 2012; Orsborn, Dangi, Moorman, & Carmena, 2012). In select cases, the possibility of fast, noniterative closed-loop simulation demonstrated by our modeling approach may provide an efficient choice for preclinical validation. An important limitation of this letter in this regard is that we have not specified a generic method to compute an approximately optimal (or at least, proficient) control policy for every possible neuroprosthetic device, for which additional work is needed. A solution may involve an efficient adaptation of the iterative-tuning perspective (Heliot et al., 2010).

Some researchers have suggested in passing that open-loop decoding may be a poor surrogate for closed-loop performance in determining the relative importance of various decoding strategies (Cunningham et al., 2011). Our results temper this notion, demonstrating that poor open-loop performance directly degrades closed-loop performance (see Figure 2H). Moreover, we show that compensating for biased decoders is not complete or free of cost (see Figure 4). This view is more in line with previous work, which concludes that some advantages of a given BMI algorithm seen in open-loop mode (although not all) are expected to persist in closed-loop mode (Chase et al., 2009).

### 4.4. Implications for BMI Algorithm Design.

While our letter does not aim to propose a new BMI algorithm, the work presented here contributes to the theoretical basis for new BMI algorithms focused on optimizing closed-loop performance. In order to apply our results to BMI controller design, it is important to recognize a stark division among the various existing BMI algorithms that loosely draw on the thematic elements of closed-loop control. A critical examination of BMI algorithms that invoke the language of closed-loop control reveals two fundamentally distinct categories. The first category focuses on optimizing open-loop reconstruction error. The second category focuses on optimizing closed-loop performance, exploiting the potential of human-in-the-loop dynamics.

The first category, based on recursive Bayesian estimation, proposes state equations (latent variable models) that describe the dynamics of a controller operating a linear gaussian plant (Kemere & Meng, 2005; Shanechi et al., 2010; Srinivasan et al., 2006). Although these three papers are titled with different emphasis (“state space analysis,” Srinivasan et al., 2006; “feed-forward control,” Kemere & Meng, 2005; “feedback-control,” Shanechi et al., 2010), they all propose nearly mathematically equivalent state-space models that represent closed-loop controllers on linear-gaussian plants. The “feedforward control” title in Kemere and Meng (2005) is particularly understated in that the authors actually develop a linear feedback control policy as the basis for their state equation (equation 8). There is little practical resulting mathematical difference between the various methods. Earlier work (Diedrichsen et al., 2009; Scott, 2004; Todorov, 2004; Todorov & Jordan, 2002) provides the theoretical justification for the premise of these algorithms that the brain is a closed-loop controller in the operation of limb movement. This category of algorithms is geared to optimize trajectory reconstruction error, described in the experimental BMI literature as “open-loop decoding error.” Despite modeling closed-loop dynamics, these papers do not exploit the closed-loop dynamics of the human operating the actual brain-machine interface itself. For example, Shanechi et al. (2010) employ a state equation that models the closed-loop control of a muscle-like plant rather than the control of the BMI itself. In fact, these authors choose to illustrate the capability of their resulting method in offline decoding from synthetic neural data, where an algorithm that explicitly incorporated human control of the BMI itself would inherently need closed-loop validation. Partly as a result of a BMI design philosophy rooted in estimating offline arm movements, all three methods (Kemere & Meng, 2005; Shanechi et al., 2010; Srinivasan et al., 2006) loosely invoke the language of control without actually exploiting the potential of human-in-the-loop dynamics.

In contrast, the second, newer, category of BMI algorithms attempts to exploit the brain's dynamics as a closed-loop controller in order to optimize closed-loop performance (Dangi et al., 2011; Gage et al., 2005; Gilja et al., 2010; Golub et al., 2011; Jarzebowski et al., 2008; Orsborn et al., 2012). Our letter is relevant to this second category of BMI algorithms: it provides theoretical justification for their premise that the brain is a closed-loop controller in the operation of BMI algorithms. Early work by Jarzebowski et al. (2008) redesigned the P300-based speller by explicitly modeling the brain as a closed-loop controller, resulting in a provably optimal strategy for dynamically adjusting alphanumeric menus (sensory feedback). Separate work by Gilja et al. (2010), and subsequently Dangi et al. (2011) and Orsborn et al. (2012) examined modified “cursorGoal” state equations that incorporate the current state of the actuator (an on-screen pointer) in the context of a recursive Bayesian decoder. Newer work along these lines incorporates feedback information theory (Omar et al., 2011) and team decision theory (Kim & Coleman, 2011) in the closed-loop design of BMI algorithms. Most recently, Golub et al. (2011) proposed a BMI algorithm that demonstrates particularly poor open-loop decoding error but enhanced closed-loop control relative to other competing methods that have better open-loop decoding error. The development of a channel selection strategy that incorporates the closed-loop perspective may be another opportunity for revision to existing BMI algorithm theory (see section E.6). The various new strategies discussed here (and likely others beyond this limited literature search) represent exciting new efforts to exploit the dynamics and limitations of learning (as frequently described in the neuroscience literature) and signal generation from the biological neural systems (humans) that control them.

## Glossary

Unless otherwise noted, capital variables used in formulas are matrices and lowercase variables are vector random variables or scalar constants, as defined below.

Name . | Dimension . | Description . |
---|---|---|

T | Symbol | Denotes matrix transpose |

N | Scalar | Number of neurons |

k | Scalar | Index representing discrete time |

M | Scalar | Number of dimensions in the state. In this study, there are five
parameters (see x) _{k} |

K | Scalar | Number of dimensions in the control input, here K = 2 (see u) _{k} |

Scalar | The time step used in discretizing the simulation | |

p, _{x}p _{y} | Scalar | x, y position of the cursor |

v, _{x}v _{y} | Scalar | x, y velocity of the cursor |

x _{k} | M × 1 | State of the closed-loop system, incorporating various kinematics of
the cursor. Equivalent to the output of the neuroprosthetic device
decoder. In this study, M = 5 to accommodate
2D cursor position, 2D cursor velocity, and the scalar 1 to allow
affine functions of the state |

n _{k} | N × 1 | Outputs of the observation process, that is, a vector of the binned
spike counts from all neurons in the ensemble at time step k |

Name . | Dimension . | Description . |
---|---|---|

T | Symbol | Denotes matrix transpose |

N | Scalar | Number of neurons |

k | Scalar | Index representing discrete time |

M | Scalar | Number of dimensions in the state. In this study, there are five
parameters (see x) _{k} |

K | Scalar | Number of dimensions in the control input, here K = 2 (see u) _{k} |

Scalar | The time step used in discretizing the simulation | |

p, _{x}p _{y} | Scalar | x, y position of the cursor |

v, _{x}v _{y} | Scalar | x, y velocity of the cursor |

x _{k} | M × 1 | State of the closed-loop system, incorporating various kinematics of
the cursor. Equivalent to the output of the neuroprosthetic device
decoder. In this study, M = 5 to accommodate
2D cursor position, 2D cursor velocity, and the scalar 1 to allow
affine functions of the state |

n _{k} | N × 1 | Outputs of the observation process, that is, a vector of the binned
spike counts from all neurons in the ensemble at time step k |

Neural Signal Model Parameters (see equation A.1) . | |
---|---|

N (number of neurons) | 96 |

c (baseline firing rate, in spikes/sec) _{i} | 10 |

b (preferred direction parameters, in units of _{i} | where |

Neural Signal Model Parameters (see equation A.1) . | |
---|---|

N (number of neurons) | 96 |

c (baseline firing rate, in spikes/sec) _{i} | 10 |

b (preferred direction parameters, in units of _{i} | where |

Kalman Filter . | ||
---|---|---|

M × 1 | The Kalman filter's hidden (latent) state variable, equivalent to
the user's intentions for the cursor kinematics (2D position and
velocity). In this study, M = 5 to
accommodate 2D intended cursor position, 2D intended cursor
velocity, and the scalar 1 to allow affine functions of the
state. | |

K and _{k}K | M × N | “Kalman gain.” This matrix transforms observations
into changes in the posterior of the state vector. The Kalman gain
defines how much deviations in the predicted observations (the
“innovation” in estimation theory) translate to
deviations in the predicted state in calculating the posterior state
estimate. Generally this matrix (K) changes as the posterior variance is updated. The
steady-state Kalman gain (_{k}K) is the asymptotic
value of K. _{k} |

Kalman Filter | ||

M × 1 | Mean of the prediction density on user intention at time step k. | |

M × M | Covariance of the prediction density on user intention at time step k. | |

M × 1 | Mean of the posterior density on user intention at time step k. | |

M × M | Covariance of the posterior density on user intention at time step k. | |

F | M × M | State transition matrix used for the random walk state equation (equation B.2). |

H | N × M | Multiplies the state in the gaussian observation model (equation B.7). |

M × 1 | Zero-mean additive gaussian noise in the random walk state equation (latent variable model, equation B.2) used by the KF to describe how user intention is expected to evolve one step into the future. | |

q _{k} | N × 1 | Zero-mean additive gaussian noise in the observation equation (observation model, equation B.7) used by the KF to approximate how intended velocity drives neural activity. |

M × M | Covariance of random gaussian vector in the state equation, B.2. | |

N × N | Covariance of random gaussian vector q in the observation equation, B.7. _{k} | |

Optimal Linear Estimator (OLE) and Population Vector Algorithm (PVA) | ||

P | N × 2 | Preferred direction matrix, calculated in equation B.13. Each row contains the 2D normalized preferred direction for a different neuron. |

S | N × N | Normalizing matrix, combined with H to calculate P in equation B.13. |

N × 1 | Recentered and rescaled binned spiked counts | |

D | 2 × N | Matrix that maps observations to the decoded velocity, calculated differently for the OLE and PVA |

Modeling the Neural Control Network (Linear Quadratic Regulator) | ||

u _{k} | K × 1 | The control input that the controller applies to affect the state. In this study, the control input is the user's intention for the 2D velocity of the cursor. |

A | M × M | The state transition matrix in the model of the LQR. This is
typically not equal to F. |

B | M × K | Matrix that defines how control inputs affect the state |

w _{k} | M × 1 | Zero-mean gaussian noise added to the state at each step of LQR control. |

Q | M × M | Matrix that defines the quadratic form of the cost terms related to state. |

R | K × K | Matrix that defines the quadratic form of the cost terms related to control inputs |

L | K × M | Matrix defining the linear mapping from the current state to the optimal control input, where the optimal control policy is . |

J | Scalar | Number of sensory feedback steps for each decoding bin. Allows the model to represent the brain as it receives sensory feedback at the fixed refresh rate of the monitor while decoding bin width is independently varied. |

Neural Control Network (LQR) cost parameters (described in Equations C.4, C.10, C.11) | ||

Scalar | Weight for cursor distance from target. Unless otherwise specified, . | |

Scalar | Weight for nonzero cursor speed. Unless otherwise specified, . | |

Scalar | Weight for nonzero intended speed. Unless otherwise specified, . |

Kalman Filter . | ||
---|---|---|

M × 1 | The Kalman filter's hidden (latent) state variable, equivalent to
the user's intentions for the cursor kinematics (2D position and
velocity). In this study, M = 5 to
accommodate 2D intended cursor position, 2D intended cursor
velocity, and the scalar 1 to allow affine functions of the
state. | |

K and _{k}K | M × N | “Kalman gain.” This matrix transforms observations
into changes in the posterior of the state vector. The Kalman gain
defines how much deviations in the predicted observations (the
“innovation” in estimation theory) translate to
deviations in the predicted state in calculating the posterior state
estimate. Generally this matrix (K) changes as the posterior variance is updated. The
steady-state Kalman gain (_{k}K) is the asymptotic
value of K. _{k} |

Kalman Filter | ||

M × 1 | Mean of the prediction density on user intention at time step k. | |

M × M | Covariance of the prediction density on user intention at time step k. | |

M × 1 | Mean of the posterior density on user intention at time step k. | |

M × M | Covariance of the posterior density on user intention at time step k. | |

F | M × M | State transition matrix used for the random walk state equation (equation B.2). |

H | N × M | Multiplies the state in the gaussian observation model (equation B.7). |

M × 1 | Zero-mean additive gaussian noise in the random walk state equation (latent variable model, equation B.2) used by the KF to describe how user intention is expected to evolve one step into the future. | |

q _{k} | N × 1 | Zero-mean additive gaussian noise in the observation equation (observation model, equation B.7) used by the KF to approximate how intended velocity drives neural activity. |

M × M | Covariance of random gaussian vector in the state equation, B.2. | |

N × N | Covariance of random gaussian vector q in the observation equation, B.7. _{k} | |

Optimal Linear Estimator (OLE) and Population Vector Algorithm (PVA) | ||

P | N × 2 | Preferred direction matrix, calculated in equation B.13. Each row contains the 2D normalized preferred direction for a different neuron. |

S | N × N | Normalizing matrix, combined with H to calculate P in equation B.13. |

N × 1 | Recentered and rescaled binned spiked counts | |

D | 2 × N | Matrix that maps observations to the decoded velocity, calculated differently for the OLE and PVA |

Modeling the Neural Control Network (Linear Quadratic Regulator) | ||

u _{k} | K × 1 | The control input that the controller applies to affect the state. In this study, the control input is the user's intention for the 2D velocity of the cursor. |

A | M × M | The state transition matrix in the model of the LQR. This is
typically not equal to F. |

B | M × K | Matrix that defines how control inputs affect the state |

w _{k} | M × 1 | Zero-mean gaussian noise added to the state at each step of LQR control. |

Q | M × M | Matrix that defines the quadratic form of the cost terms related to state. |

R | K × K | Matrix that defines the quadratic form of the cost terms related to control inputs |

L | K × M | Matrix defining the linear mapping from the current state to the optimal control input, where the optimal control policy is . |

J | Scalar | Number of sensory feedback steps for each decoding bin. Allows the model to represent the brain as it receives sensory feedback at the fixed refresh rate of the monitor while decoding bin width is independently varied. |

Neural Control Network (LQR) cost parameters (described in Equations C.4, C.10, C.11) | ||

Scalar | Weight for cursor distance from target. Unless otherwise specified, . | |

Scalar | Weight for nonzero cursor speed. Unless otherwise specified, . | |

Scalar | Weight for nonzero intended speed. Unless otherwise specified, . |

## Appendix A: Modeling the Output Neural Signals

*i*has a preferred direction

*b*and a minimum firing rate

_{i}*c*. The instantaneous mean firing rate of neuron

_{i}*i*at time

*k*is then Based on this formula, the preferred direction of a neuron is the intended direction of movement that elicits maximal firing rate. For neuron

*i*, the number of spikes generated for a discrete amount of time is drawn from the following distribution: Parameters (see the Glossary) are chosen to produce realistic spiking rates, with a minimum firing rate of 10/s and a firing rate of 24/s if the cursor is moving at 20 cm/s along the preferred direction.

## Appendix B: Modeling the Neuroprosthetic Device

The conventional neuroprosthetic device implements a neural signal decoder that maps signals from output neurons into an estimate of the user's intention for the device. Our models for the neuroprosthetic device are identical to the implementations of three mainstream neural signal decoders: the Kalman filter (KF), optimal linear estimator (OLE), and population vector algorithm (PVA). In this section, we recapitulate the implementation of these decoders in the context of our closed-loop model.

### B.1. Kalman Filter.

*k*, and

*n*are neural signals that reflect those intentions. In our model, the device is simply a computer cursor, so represents intended kinematics (position , and velocity , in 2D): The constant 1 in this vector is used to implement an affine state equation. More generally, the state vector can incorporate any parameters that relate to the neuroprosthetic device, such as position, angles of joints, or force. In order to estimate from , the KF defines a state equation and an observation equation. We describe the meaning and implementation of these components below.

_{k}#### B.1.1. Defining the KF State Equation.

*F*is an fixed linear map and . Commonly, and in our implementation,

*F*is chosen to represent simple kinematics that propagates velocity into position at discrete time step widths seconds. For the 5 × 1 described in equation B.1,

*F*is given by

^{3}in the denominator to allow final units of cm

^{2}/s

^{2}in the corresponding entry. The value of these diagonals matches the order of magnitude seen in the changes of velocity from simulated reaches in the actual reach generation. The covariance was scaled with bin width , as described in equation B.4.

#### B.1.2. Defining the KF Observation Model.

*i*at time step

*k*as they depend on the intended velocity: where are identically and independently distributed across

*k*. Note that this observation model is mismatched because neural observations are actually generated as an inhomogeneous Poisson process (see equation A.2).

The parameters of this model are estimated from binned spiking activity using
linear regression. Specifically, cursor velocity is regressed against binned
spike count. Data for this regression are obtained from a training set of
eight simulated reaching movements. Reaches begin at evenly spaced points on
the circle (0, pi/4, pi/2, , 7pi/4) and end at the origin, generated by a linear
quadratic controller (LQR). Appendix C explains the LQR in detail. (For an introductory text on this
subject, see Bertsekas, 2005.) To
estimate the covariance , the mean of squared residuals of observed spikes from the
linear model is used as an estimate of the variance of each neuron's spike
generation. The covariance estimate is set to a diagonal matrix with the *i*th
neuron's estimated variance at the *i*th diagonal entry.

*H*matrix (dimensions N × M). Note the tilde is removed from

*a*,

*b*, and

*c*to indicate that estimates of these parameters (rather than their true values) are used to form

*H*from the linear regression: where

*H*relates the binned neural activity

*n*(dimensions N × 1) and the intended cursor kinematics (dimensions M × 1): Here

_{k}*q*is a zero-mean gaussian variable with covariance .

_{k}#### B.1.3. Kalman Filter Equations.

The KF combines the state and observation equations to estimate the user's
intention from output neural activity. Because this neural activity is a
noisy representation of the user's intention, the KF uses a gaussian
probability distribution (also called probability density) to represent its
estimate of intention given neural activity. The notations and are used to indicate the mean and variance of this
distribution, respectively, for the user's intention at time step *k*, given neural activity up to and including time step *j*. The KF alternates between two steps:

Prediction: compute and from and .

Update: compute and from and .

*n*_{k+1} is a vector of binned spike counts across the
neuronal ensemble. The matrix *K _{k}*, called the Kalman gain, defines the extent to which new
observations

*n*

_{k+1}affect the filter's estimate of intended cursor kinematics. A useful property of the Kalman gain is that it does not depend on the observations received. Under appropriate conditions, the Kalman gain converges over time, so that the entire filter, originally linear and time varying, becomes linear and time invariant. This property will be useful in constructing the neural control network that operates the KF neuroprosthetic device, as described in section C.3.1.

### B.2. Optimal Linear Estimator and Population Vector Algorithm.

An alternate discussion of the OLE and PVA is found elsewhere (Chase et al., 2009). We present these methods using
notation that is consistent with the remainder of our model. Both OLE and PVA
decode velocity at time step *k* through a linear transformation
of a recentered-and-rescaled vector of binned spike counts at time step *k*, without incorporating neural activity from previous time
steps.

*P*. In our notation, the preferred directions are encoded in the third and fourth columns of

*H*(N × 5) that relate to the velocity (see equation B.6). The relevant columns of

*H*can be extracted and normalized across rows to get the desired matrix

*P*:

*T*in this formula denotes the matrix transpose. The matrix to the right of

*H*extracts the third and fourth columns of

*H*. The matrix

*S*() is a diagonal matrix such that the rows of

*P*are normalized. Accordingly,

*S*

_{i,i}=[(

*H*

_{i,3})

^{2}+(

*H*

_{i,4})

^{2}]

^{−1/2}. In other words,

*S*

_{i,i}is the reciprocal of the

*L*

_{2}norm of the

*i*th row of . The recentered-and-rescaled vector of binned spike counts is given by Define additionally:

Using the respective *D* matrices, the OLE and PVA can be
summarized by two steps:

Move the current cursor position by the current cursor velocity times bin width .

Estimate the new cursor velocity at time step

*k*as .

Recall that an estimator is unbiased if the average difference between estimate
and true value is zero. Given valid model assumptions, the OLE is an unbiased
decoder. In contrast, the PVA is biased when the preferred directions in *P* are not distributed uniformly, resulting in systematic
differences between intended and decoded velocity during open-loop decoding, as
explored in Figure 4.

## Appendix C: Modeling the Neural Control Network

The neural control network is modeled as a stochastic optimal controller. The
stochastic optimal control model has been used extensively in the analysis of
natural arm movements. In our model, this controller uses sensory feedback on cursor
kinematics to drive primary motor cortical activity. The term *stochastic* refers to the controller's ability to perform
despite uncertainty in decoded cursor movements (in this case, resulting from output
neural activity). The term *optimal* refers to the minimization of
costs associated with cursor kinematics and intended velocity, explicitly rewarding
goal-directed cursor movements while indirectly conserving chemical energy related
to the production of action potentials.

The linear quadratic regulator (LQR) is one type of stochastic optimal controller, used for problems where the state to be controlled (in our case, cursor kinematics) evolves linearly with the addition of gaussian noise. Because our binned output neural activity is only approximately gaussian with linear neuroprosthetic devices (KF, OLE, PVA), the LQR framework provides an approximate stochastic optimal controller. Nevertheless, the LQR is a computationally tractable choice to model the neural control network that sufficiently replicates experimental data as described in the main text.

*x*to indicate that the relevant state to the control network is decoded 2D cursor position and velocity. The constant 1 is added to accommodate affine state equations.

_{k}*u*determined by the neural control network: The LQR also requires a state equation that describes how cursor kinematics

_{k}*x*are expected to evolve in time. Unlike the KF state equation (see section B.1.1), the plant state equation additionally specifies how control inputs

_{k}*u*from the neural control network influence cursor kinematics. In section C.1, we introduce equations that implement the LQR for a generic linear gaussian state equation. In section C.2, we describe how LQR behavior can be shaped to implement reaching movements of the neuroprosthetic cursor while conserving spikes. In section C.3, we specify how the generic LQR equations from section C.1 can be customized to control neuroprosthetic devices based on the KF, OLE, and PVA. In section C.4, we discuss how the final model represents sensory feedback that occurs at a timescale that is finer than the bin width of the neuroprosthetic device.

_{k}### C.1. Generic Equations for the Linear Quadratic Controller.

In this section, we introduce equations that implement the LQR for a generic linear gaussian state equation. (This material can also be found in an introductory text: Bertsekas, 2005.) The LQR gives an optimal control policy for linear systems with additive gaussian noise under a quadratic cost function.

*F*replaced by

*A*and replaced by

*w*. However, the plant state equation, C.3, includes an additional term for the control input

_{k}*u*that is decided by the neural control network. In the generic formulation, the LQR gives an optimal control policy on how to set the control inputs

_{k}*u*based on observations of the cursor

_{k}*x*such that the expected value of a quadratic cost function on both state and control inputs is minimized over all time. The formula for this cost is given by

_{k}*Q*and

*R*are square matrices that determine the relative costs associated with state and control, respectively. The optimal control policy given by the LQR derivation is a linear function of the state: where and

*G*is computed by solving the algebraic Ricatti equation: An approximate solution for

*G*is obtained by repeating several steps of the following recursion, until the sequence

*G*, converges to within some tolerance, where For this letter, we terminated the recursion when the Frobenius norm of

_{i}*G*

_{i+1}−

*G*was lower than 1e-7.

_{i}### C.2. Using the LQR Cost Function to Generate Reaching Behavior.

The cost function represents constraints of the reaching task. The LQR is used to
generate reaches toward the origin by setting the cost matrices *Q* and *R* such that the origin represents a
low-cost state. There are three costs:

—Penalizes squared distance to origin. This parameter incentivizes reaches toward the origin.

—Penalizes square of cursor velocity. This parameter incentivizes reaching and holding to zero velocity. Also, because kinetic energy is proportional to the square of velocity, relates to energy dissipated in applications that involve movement of a robotic arm or FES-enabled limb.

—Penalizes square of intended cursor speed. Because spiking rates increase with intended cursor speed, is effectively a cost associated with generating spiking activity.

### C.3. Customizing the Generic LQR Framework to Represent the Neural Control Network in Closed-Loop Neuroprosthetic Operation.

#### C.3.1. LQR Control of the Kalman Filter.

The state dynamics of the generic LQR model given in section 5.1 assumes the
neuroprosthetic device is time invariant, whereas the standard Kalman filter
is linear but time varying. This is a result of the Kalman gain *K _{k}* (see equation B.10), which is indexed by time. The Kalman gain can converge to
a steady state under certain common conditions, not discussed here, in part
because the evolution of the Kalman gain is also independent of the
observation process (Bertsekas, 2005;
Stengel, 1994). We estimate a
steady-state Kalman gain by running equations B.10 and B.12 forward until the Frobenius norm of
the change in the Kalman gain matrix is less than 1e-7. We denote this
steady-state Kalman gain by

*K*. The resulting steady-state KF is linear time invariant, which is fully compatible with the LQR model in section C.1.

*n*are distributed according to equation C.2 and the KF takes the form B.11, the evolution of the full plant (including output motor neurons and KF) can be expressed as a linear transformation of the state (

_{k}*x*, containing 2D cursor position and velocity) and control inputs (

_{k}*u*, containing user-intended velocity): Recall that

_{k}*p*and

_{x}*p*are actual cursor position,

_{y}*x*contains actual cursor position and velocity, and

_{k}*u*is user-intended cursor velocity. Equation C.14 can be separated into a sum of linear transformations on the state and the control input with additive gaussian noise: This matches the original form of the LQR formulation, where Note that equation C.16 differs from equation C.12 in that it does not completely discard the current estimate of velocity

_{k}*v*(because of the nonzero entries in the right-most matrix). This reflects the fact that the Kalman filter integrates over prior neural signal observations, in contrast to the OLE and PVA.

_{k}#### C.3.2. LQR Control of the Optimal Linear Estimator and Population Vector Algorithm.

*A*matrix in the state equation that is identical to the perfect decoder case (see equation C.12). Decoded velocity in both cases can be written as , where represents the recentered-and-rescaled firing rate vector as defined in equation B.14 and

*D*is selected differently for each of the two methods as described in equations B.15 and B.16. Combining the normalized observation model given in equation B.14 with the decoded velocity gives a formula that relates decoded velocity (

*v*) to intended velocity (

_{k}*u*): These calculations define the components of the plant state equation, C.3, for OLE and PVA: Recall from section B.2 that for the OLE,

_{k}*D*=(

*P*)

^{T}P^{−1}

*P*, where

^{T}*P*is given in equation B.13, resulting in . For the PVA, , so that . For the PVA, the fact that

*B*is not necessarily the identity (depending on the value of

*P*) embodies the notion that the PVA can be a biased decoder, depending on the population of neurons (via the value of P).

### C.4. Extending the LQR to Uncouple the Time Resolution of Sensory Feedback and Decoder Bin Width.

The LQR framework as described in the previous sections allows control inputs *u _{k}* and state

*x*to change simultaneously on every time step. In section 3, the behavior of decoders on different bin widths is analyzed. Under the standard LQR framework, if the decoder bin width is 300 ms, the controller can “perceive” states only every 300 ms. This problem is manifest in the dependence between cost function and bin width. For example, if the bin width is 25 ms, then the cost is evaluated on states every 25 ms. If the bin width is instead 300 ms, then the cost function is evaluated on states every 300 ms. In reality, the human receives cursor state updates at the screen refresh rate rather than the decoding rate, which is typically slower. In order to faithfully compare closed-loop behavior across various decoding bin widths, we needed the controller to receive cursor state at a constant rate (in our letter, 5 ms) independent of the decoding bin width. This is analogous to varying decoder bin width while the computer screen refresh rate is fixed at 200 Hz. Below, we describe how the model used in the main text uncouples the time resolution of sensory feedback and decoder bin width.

_{k}*J*times for every second drift period, where is also the decode bin width. Let The sequence of cursor states (position and velocity kinematics in 2D) in the drift period between decoder updates is . The cost in the state accumulated between decode periods (omitting the control cost) is then This can be expressed as a quadratic form of

*x*: The cost accumulated for the control input between decode periods is

_{k}*JR*.

*J*steps. Because the decoder equations are not changed by modeling sensory feedback at finer timescales, as defined in sections C.3.1 and C.3.2.

## Appendix D: Numerical Issues and Implementation Details

The Ricatti equation recursions for the LQR and the steady-state Kalman gain were terminated when the Frobenius norm of the difference between successive Kalman gains dropped below a threshold. As a caution, note that we had originally implemented a large, constant number of iterations, which did not consistently provide convergence.

The Kalman filter described in section B.1, a linear time-varying algorithm, was made time invariant (LTI) by using the steady-state Kalman gain. This was useful for deriving the appropriate LQR model to match a Kalman filter. However, using the LTI Kalman filter for decoding purposes initially generates inaccurate decodes, since that filter acts as if it has already received many observations from the past that have not actually occurred. To resolve this issue in our closed-loop simulations, the LQR derived from an LTI Kalman filter is used for the controller, whereas the decoder is the standard (linear, time-varying) Kalman filter.

## Appendix E: Parameter Sensitivity Analysis

Is our result robust? In other words, does our reproduction of experimental behavior (see Figures 2, 4, and 5) require delicate parameter tuning? This question is central to the impact of any theory, including this one. Given that parameters are likely to vary between subjects, we need to be able to demonstrate that our essential claimed model behaviors (see Figures 2, 4, and 5) are preserved across a wide range of parameter choices. Below, we demonstrate that our essential claimed behaviors are preserved across at least four orders of magnitude in parameter choice.

### E.1. Number of Parameters.

The model employs several different constants that relate to neural spiking properties, neural signal decoder equations, and control. Nearly all of these constants are derived from previously published experimental data. The neural spiking parameters are based on point-process models of motor cortical neurons that have been fit to primate data (Truccolo et al., 2005). The neural prosthetic decoder equations are constrained by the Kalman filter, PVA, and OLE algorithms, where parameters are fit by maximum likelihood applied to the simulated neural activity. The controller approximates the plant (which includes the output neural activity and neural prosthetic decoder equations) as a linear state equation that is also fit by maximum likelihood. Reaction time is a constant at 200 ms, within range of experimental behavior.

In the main text, is fixed at 1.8. Because cost is arbitrarily scaled, is our only free parameter. Note the physical meaning of this parameter: it embodies the extent to which the human expends energy (in the form of action potentials) versus attempting to achieve an accurate reaching movement with crisp stopping behavior. Having only one free parameter in the model greatly expedites the parameter sensitivity analysis, which consequently involves sweeping one parameter, , across several orders of magnitude.

### E.2. Establishing Whether Model Behavior Is Consistent with Experiment.

How do we determine in automated fashion if the essential behaviors displayed in figures 2, 4, and 5 are preserved across multiple orders of magnitude of ? The essential behaviors the model must reproduce are:

- •
Figure 2: Mean integrated distance (MID) increases with bin width in both open- and closed-loop control.

- •
Figure 4: Closed loop control results in a reduction of PVA bias versus open-loop control.

- •
Figure 5: Measured neuronal preferred directions shift between arm and closed-loop brain-machine interface control for the PVA, but shift is negligible for the OLE.

To automate the process of determining the validity of these claims across a wide number of , we conduct hypothesis tests as follows:

- •
Figures 6A and 6B, corresponding to Figure 2B in the main text:

*H*_{0}: slope of MID versus bin width = zero.*H*_{1}: slope > zero. The*p*value on the slope is calculated from the observed Fisher information matrix based on maximum likelihood estimation assuming MID is an affine gaussian function of bin width. This test is conducted on MID versus bin width graphs in closed-loop mode (see Figure 6A) and open-loop mode (see Figure 6B). - •
Figure 6C, corresponding to Figure 4A in the main text:

*H*_{0}: Bias in decoded angle is unchanged between open- and closed-loop control of the PVA.*H*_{1}: Bias decreases with closed-loop control of the PVA. A one-tailed Wilcoxon signed rank (nonparametric) test is applied. - •
Figures 6D and 6E, corresponding to Figures 5A and 5D, respectively:

*H*_{0}: No shift in neuronal preferred direction between arm control and brain-machine interface control.*H*_{1}: Nonzero shift with the PVA (see Figure 6D), but negligible shift with the OLE (see Figure 6E). A two-tailed Wilcoxon signed rank (nonparametric) test is applied.

With hypothesis tests described as above, Figure 6 reports the magnitude of the corresponding statistic, as well as
the *p* value, and a binary variable that indicates whether this *p* value meets a *p*<0.05 threshold for
statistical significance. is varied evenly on a logarithmic scale between 1e-3 and 1e1.
The end result is that essential trends in Figures to 1, 2, 4, and 5 are
reproduced across several orders of magnitude in .

### E.3. Generating Data Points for the Parameter Sensitivity Analysis.

Neural ensemble sizes of 96 neurons for Figures 6A to 6C and 10 neurons for Figures 6D and 6E were employed for consistency with Figures 2, 4, and 5. Neuron parameters for the entire ensemble were drawn once for each panel in Figure 6 and held fixed across multiple trials such that variation in trial statistics was driven by the randomness of point process neural outputs, but not by variation in neural parameters. was varied evenly on a logarithmic scale between 1e-3 and 1e1.

Figures 6A and 6B each use 100 MID trials per data point, where each of the six
data points represents a specific bin width. Every trial uses a newly drawn
ensemble of neurons. Six bin widths were examined, as in Figure 2B: 25,
50, 100, 200, 250, and 300 ms. The *p*-value represents
the statistical significance of the nonzero slope of the MID versus bin width
graph, computed separately for each ratio. The slope (and hence *p*-value) for each ratio is computed from a regression on 100 MID trials per data
point × 6 data points = 600 MID trials.

Figure 6C uses a single fixed ensemble of
neurons to calculate the bias in the first decoded velocity from 50,000 unique
reaching trials for each value. The *p*-value represents the
significance of reduction in PVA decoding bias between arm mode and closed-loop
mode.

### E.4. Tuning Curve Shifts Occur with Ensembles That Include Neurons of Variable Baseline Firing Rate and Depth of Modulation.

We confirmed that tuning curve shifts could be reproduced with an ensemble that included neurons of variable baseline firing rate and depth of modulation (see Figure 7). For 10 neurons, base firing rates are uniformly chosen from [2, 20] spikes/sec, and tuning depths (difference between maximum and minimum firing rate) are uniformly chosen from [2, 20] spikes/sec. Tuning curves again shift with this more diverse set of neurons (see Figure 7).

### E.5. Conclusion of Parameter Sensitivity Analysis.

### E.6. Implications for Channel Selection in BMI Performance Optimization.

Channel selection can have an impact on BMI performance in both open-loop decoding (Vargas-Irwin et al., 2010; Wahnoun, He, & Helms Tillery, 2006) and the resulting closed-loop performance (Wahnoun et al., 2006). Although our work here does not attempt to directly study this phenomenon, our analysis (see Figure 2H) explains how open-loop decoding quality could improve closed-loop performance. A channel selection process that improves open-loop decoding by detecting strong model mismatch might be expected to improve closed-loop performance by decreasing the magnitude of the unpredictable velocity decoding error that the user would need to correct at each time step. In contrast, our model suggests that when channel selection criteria focus on reducing predictable errors (bias), open-loop improvements might translate less effectively into skilled closed-loop performance, consistent with the guiding philosophy of earlier work (Chase et al., 2009).

## Acknowledgments

This work was funded in part by the American Heart Association Scientist Development Grant (11SDG7550015) and UCLA Radiology. We thank David Rutledge at the California Institute of Technology and Sanjoy Mitter at MIT for institutional support.

## References

## Author notes

Color versions of some figures in this article are presented in the supplemental material available online at http://www.mitpressjournals.org/doi/suppl/10.1162/NECO_a_00394.