## Abstract

It has been debated whether kinematic features, such as the number of peaks or decomposed submovements in a velocity profile, indicate the number of discrete motor impulses or result from a continuous control process. The debate is particularly relevant for tasks involving target perturbation, which can alter movement kinematics. To simulate such tasks, finite-horizon models require two preset movement durations to compute two control policies before and after the perturbation. Another model employs infinite- and finite-horizon formulations to determine, respectively, movement durations and control policies, which are updated every time step. We adopted an infinite-horizon optimal feedback control model that, unlike previous approaches, does not preset movement durations or use multiple control policies. It contains both control-dependent and independent noises in system dynamics, state-dependent and independent noises in sensory feedbacks, and different delays and noise levels for visual and proprioceptive feedbacks. We analytically derived an optimal solution that can be applied continuously to move an effector toward a target regardless of whether, when, or where the target jumps. This single policy produces different numbers of peaks and “submovements” in velocity profiles for different conditions and trials. Movements that are slower or perturbed later appear to have more submovements. The model is also consistent with the observation that subjects can perform the perturbation task even without detecting the target jump or seeing their hands during reaching. Finally, because the model incorporates Weber's law via a state representation relative to the target, it explains why initial and terminal visual feedback are, respectively, less and more effective in improving end-point accuracy. Our work suggests that the number of peaks or submovements in a velocity profile does not necessarily reflect the number of motor impulses and that the difference between initial and terminal feedback does not necessarily imply a transition between open- and closed-loop strategies.

## 1 Introduction

Submovement decomposition is a major class of methods in motor research that has been applied to analyze and compare various experimental conditions and subject groups (Flash & Henis, 1991; Milner, 1992; Lee, Port, & Georgopoulos, 1997; Krebs, Aisen, Volpe, & Hogan, 1999; Novak, Miller, & Houk, 2002). It assumes that a reaching movement is a superposition of primitives, each produced by a motor impulse or command. Typically a velocity profile is decomposed into a sum of shifted and scaled versions of a standard profile, and the number of fitted components (submovements) is interpreted as indicating the number of underlying discrete motor impulses or commands. An alternative view is that a continuously applied control policy can explain movement kinematics, including submovements. Indeed, stochastic optimal control models have been shown to produce movement kinematics similar to observed ones using continuous control signals without explicitly summing discrete submovements (Harris & Wolpert, 1998; Todorov & Jordan, 2002; Scott, 2004; Tanaka, Krakauer, & Qian, 2006; Shadmehr & Krakauer, 2008; Qian, Jiang, Jiang, & Mazzoni, 2013).

The target perturbation paradigm (Georgopoulos, Kalaska, & Massey, 1981; Soechting & Lacquaniti, 1983; Flash & Henis, 1991; Heath, Hodges, Chua, & Elliott, 1998) provides a particularly interesting case to contrast these two views. In this paradigm, subjects are instructed to reach with their hand to a target. At some point before or after movement onset, the target jumps to a new position, and subjects have to correct their movements to reach the new position. It is known that depending on the jump time, the perturbation can greatly change movement kinematics compared with the corresponding no-perturbation condition. In particular, the velocity profile varies with jump time and may show a plateau or a new peak. The submovement view contends that such changes in velocity profile indicate changes in the number of discrete motor commands or impulses for controlling the movements (Fishbach, Roy, Bastianen, Miller, & Houk, 2005, 2007). In contrast, the continuous-control view argues that there is no fundamental difference between the conditions with and without target perturbation (Pélisson, Prablanc, Goodale, & Jeannerod, 1986; Desmurget & Grafton, 2000); in both cases, the system estimates its state (using sensory feedback and internal prediction) to correct errors and move the effector toward the target in a continuous process. Although perturbation introduces a large error, it is corrected in the same way as errors in the no-perturbation case.

Optimal feedback control models have already been used to explain target-perturbation experiments. Some (Hoff & Arbib, 1993; Liu & Todorov, 2007) used finite-horizon formulations, which require a predetermined movement duration to define the cost function and solve the optimization problem (see the discussions in Tanaka et al., 2006 and Qian et al., 2013). To simulate target perturbation, they have to be given two movement durations (taken from experimental data) to compute two solutions for use before and after the target jump, respectively. Consequently, the models are forced to use two control policies to simulate target perturbation experiments. Although each policy is run continuously, the requirement of two policies and a switch between them weaken the argument for the continuous-control view. Moreover, requiring the models to “know” two durations seems biologically implausible, particularly when the target jumps to an unpredictable position. Another model (Rigoux & Guigon, 2012) employed both infinite- and finite-horizon formulations. The infinite-horizon component specifies a movement duration, which is then used to determine the corresponding finite-horizon control policy, and both the duration and control policy are updated at each time step. The model has the advantage of treating movements with and without target perturbation uniformly. However, it has to compute a large number of durations and policies equal to the number of time steps for any movement. Moreover, if target perturbation increases movement duration (as it often does), the model requires more control policies.

We therefore investigated whether a single, continuously applied control policy could simulate movements both with and without target perturbation. We and others recently proposed infinite-horizon stochastic optimal feedback control models that do not preset or precompute movement durations or rely on finite-horizon control policies (Huh, Todorov, & Sejnowski, 2010; Jiang, Jiang, & Qian, 2011; Qian et al., 2013). In this letter, we extend and apply this framework to target perturbation. We also model the observation that subjects can perform the perturbation task even when their vision is suppressed during the target jump and they do not see their hands during reaching (Goodale, Pélisson, & Prablanc, 1986; Pélisson et al., 1986). A recent study found that visual and proprioceptive feedbacks have not only different noise levels but also different delays (Crevecoeur, Munoz, & Scott, 2016). Although our study does not directly concern this interesting finding, we incorporated it into our model to ensure that our conclusions are valid under more realistic feedback conditions.

A related debate is the relative importance of feedforward versus feedback control of reaching. There is little doubt that reliable sensory feedback can improve performance (Bernstein, 1967; Todorov & Jordan, 2002). However, sensory feedback in the initial phase of reaching does not seem to be as important as that in the terminal phase (Beaubaton & Hay, 1986). This has led to the proposal that motor control uses a crude or even open-loop mechanism initially and switches to a refined, closed-loop mechanism later, implying a two-stage process. However, few models have considered the alternative that a single control policy may appear to assign different importance to initial and terminal feedback. We therefore extended the previous infinite-horizon model (Phillis, 1985; Jiang et al., 2011; Qian et al., 2013) to implement Weber's law of sensory perception. This law states that uncertainty in estimating a value is proportional to the value estimated. We therefore assume, for example, that the uncertainty in judging the distance between the hand and target is proportional to this distance (Baird, 1970). We were able to implement Weber's law via state-dependent noise in the sensory feedback equation because we represented the effector state relative to the target. Surprisingly, this well-known sensory law has not been incorporated into optimal feedback control models. We examined whether the law naturally rendered initial and terminal sensory feedback less and more effective, respectively, without assuming a switch between feedforward and feedback strategies.

Some preliminary results were reported in abstract form (Fangwen, Zhaoping, & Qian, 2012).

## 2 Results

*feedback*to denote sensory inputs only.)

### 2.1 Weber's Law and Different Effects of Initial and Terminal Visual Feedback

The matrix thus determines Weber fractions in our model. For the diagonal we used, the diagonal terms are proportional to Weber fractions for the components of , which include position, velocity, and acceleration (see the Appendix). We determined the ratios among the diagonal terms of using the Weber fractions of visual judgments of distance (Baird, 1970; McKee, Levi, & Bowne, 1990; Levi & Klein, 1992), velocity (McKee, Silverman, & Nakayama, 1986), and acceleration (Mueller & Timney, 2016). In other words, the properties measured in visual psychophysical experiments were assumed to be applicable to the visual guidance of the hand's reaching movements to targets. Following Crevecoeur et al. (2016), we further assumed that proprioception is faster but noisier than vision and let , , , and . A complication is that proprioception can sense the hand but not the target. Therefore, proprioceptive feedback of the state should really be viewed as resulting from the integration of proprioception of the hand and visual perception or memory of the target, and we assume that Weber's law applies to this integration. Our conclusions in this letter are not sensitive to these or other parameters. For example, we obtained qualitatively similar results when the factor relating proprioceptive and visual noises varied from 2 to 5, or when both delay constants were set to 100 ms or a value from 50 to 150 ms, or when Weber's law for proprioception was eliminated (by setting to 0 and doubling ).

We used the optimal solution for a single-joint motor plant (see the appendix for details) to simulate the four visual feedback conditions for reaching movements in Beaubaton and Hay (1986). That study compared subjects' movements under full vision, no vision, vision in the initial half and vision in the terminal half of the hand-position trajectory. For all conditions, the target was always visible and stationary. When the hand was invisible, subjects could not use vision to estimate the hand state, so we set . Figure 1 shows the simulated position trajectories (panel a) and the end-point variabilities (panel b) in the four conditions. Consistent with experimental findings, we found that the initial-vision condition produced a more variable end-point position than did the terminal-vision condition, and that with the end-point variability as the measure, the initial- and terminal-vision conditions were, respectively, similar to no-vision and full-vision conditions.

Figure 1b is based on simulation results at 1 s but our conclusion does not depend on this choice of end point. Figure 2 shows the SDs of the hand position as a function of time for the four conditions. The conclusion is the same for any time after 0.5 s as the end time (Figure 1a shows that it is reasonable to assume that movements ended after 0.5 s). Thus, the model explains different effects of initial and terminal visual feedback on end-point accuracy without assuming two mechanisms. Intuitively, because of Weber's law, visual feedback toward the end of a movement is highly accurate due to small distances between the hand and target states, and consequently, its presence versus absence largely determines end-point variability. On the other hand, our simulations suggest that initial feedback had a significant impact on the positional variability during the movement (see Figures 1a and 2). Indeed, in the early phase of movements, the terminal-vision and no-vision conditions were similar to each other. This is not surprising because these two conditions were identical up to the midpoint of each trajectory, and from that point on, the continuous Newtonian dynamics kept the two conditions similar until accurate feedback in the terminal-vision condition gradually made a difference on convergence. Likewise, at the beginning, the initial-vision and full-vision condition were similar to each other. We note that the experiments also showed larger end-point biases in the initial- and no-vision conditions than in the terminal and full-vision conditions (Beaubaton & Hay, 1986). Since this could be trivially explained by assuming that visual perception of the hand state is less biased than proprioception, we did not model this result explicitly.

Finally, Beaubaton and Hay (1986) found that the difference in end-point accuracy between the initial/no-vision conditions and the terminal/full vision conditions became smaller for slower movements. To explain this result, we first defined the movement duration of a trial as the first time when the velocity had been less than 0.05 m/s for 40 ms, and then used this criterion to sort the simulated trials from fast to slow. We then plotted the end-point accuracies of the four vision conditions for the fast trials (durations less than 0.45 s) and slow trials (durations more than 0.6 s) in Figures 3a and 3b, respectively. (Each of the two groups of trials constituted about a quarter of the total number of trials.) Consistent with the experimental finding (Beaubaton and Hay, 1986), the differences in end-point accuracy between the initial/no-vision conditions and the terminal/full vision conditions were smaller for the slow trials.

### 2.2 Target perturbation at various times

We next used the same model to simulate target-perturbation experiments. In Figure 4, the target was initially 0.5 m away from the hand but jumped farther away from the hand by 0.1 m at various times to make a total distance of 0.6 m. The two columns of the figure show the position and velocity profiles, respectively. Different rows show results of different jump times (no jump, 0.1, 0.2, 0.3, and 0.4 s after reaching starts). Figure 5 shows the corresponding results for the target jumped closer to the hand by 0.1 m to make a total distance of 0.4 m.

The simulations in Figures 4 and 5 reproduced some key features of the target-perturbation experiments (Georgopoulos et al., 1981; Soechting & Lacquaniti, 1983; Flash & Henis, 1991; Heath et al., 1998; Goodale et al., 1986; Pélisson et al., 1986; Fishbach et al., 2005, 2007). When the target jumped early during reaching, the kinematics were similar to those of the no-jump condition, with smooth and single-peaked mean velocity profiles. When the target jumped late, however, the mean velocity profiles became less regular and showed a plateau or second peak. Importantly, all simulations in Figures 4 and 5 were produced by a single policy with fixed estimator and control law . In all cases, the model was run continuously, without any prefixed or precomputed duration, to move the hand toward the target regardless of whether, when, and where the target jumped. At any time, the estimator combines sensory feedback with internal prediction to produce estimated state vector for guiding control. To our knowledge, this is the first computational demonstration that a single feedback control policy with fixed parameters simulates different kinematic features of target-perturbation experiments.

In a variation of the target-perturbation paradigm (Goodale et al., 1986), subjects did not see their hands during the movement. In addition, the target jumped when the subjects' saccadic eye movement to the target was at peak speed (around the start of reaching). Consequently, the subjects were unaware of the jump because of transient saccadic suppression of vision. It was found that subjects corrected for the jump just fine even though they were unaware of it and that the movement kinematics with and without the jump were similar. We modeled the invisible hand by setting . Since our model represents the hand state relative to the target whose visual feedback was also suppressed around peak saccadic speed, proprioception alone could not determine the state during saccadic suppression. We therefore modeled the unaware condition by also setting for 50 ms and letting the target jump at the midpoint of this time window. When both and were 0, the first term of equation 2.7, which represents internal prediction, still estimated the state, effectively providing a memory mechanism. The results of our simulations for movements with and without target jump are shown in Figure 6. The model produced similar position and velocity profiles for the two conditions and corrected for the jump occurred during the 50 ms blind window, reproducing the experimental findings. Two factors contributed to the results. First, although the target jumped when sensory feedbacks were suppressed, the new target location was observed *after* the saccade, which guided subsequent control. Second, as noted above, earlier sensory feedback was not as important as later feedback because of Weber's law. The similarity between the jump and no-jump conditions was due to the fact that the jump occurred at the start of the hand movement (Goodale et al., 1986), regardless of the hand visibility for the rest of the reaching. Indeed, the jump and no-jump conditions were still similar when we simulated the visible hand condition (results not shown).

### 2.3 Submovement Decomposition

We discussed in section 1 the debate on whether the number of decomposed submovements could determine the number of discrete motor impulses or commands for movement control. We investigated this issue by applying a submovement decomposition method (Lee et al., 1997) to the velocity profiles of sample trials produced by our model for target-perturbation experiments (see Figures 4 and 5). Three examples of decomposition for each of various jump conditions are shown in Figure 7. By applying the method to 1000 velocity profiles from each condition, we obtained the histograms in Figure 8. Similar to the results of decomposing observed velocity profiles (Fishbach et al., 2005, 2007), we found with our simulations that no-jump and early-jump conditions produced fewer numbers of decomposed submovements than those of the late-jump conditions. For a given jump condition, different trials could produce different numbers of submovements because the sampled noises (from the same distributions) fluctuated. For a given trial, the number of submovements depends on the shape of the standard profile and the criterion of fit. With symmetric standard profile (Lee et al., 1997) and a criterion of accounting for 99% variance, we found that all velocity profiles we tried, including those from the no-jump condition, produced multiple submovements. Importantly, the velocity profiles were all simulated with exactly the same single control policy run continuously in time. Therefore, our results suggest that the number of decomposed submovements does not necessarily reflect the number of discrete motor impulses, commands, or policies.

A related experimental finding is that velocity profiles of slower movements require more submovements to fit (Morasso, Ivaldi, & Ruggiero, 1983; Darling, Cole, & Abbs, 1988; van der Wel, Sternad, & Rosenbaum, 2009; Levy-Tzedek, Krebs, Song, Hogan, & Poizner, 2010; Shmuelof, Krakauer, & Mazzoni, 2012). To explain this finding, we used the same criterion for Figure 9 to define the duration of a trial and sort the simulated trials of the no-jump condition from fast to slow. We then divided the trials into fast, medium, and slow groups according to whether the duration is less than 0.5 s, between 0.5 and 0.6 s, and greater than 0.6 s, and determined the submovement decomposition for each group. The results in Figure 9 show that our single-policy model reproduced the experimental finding that slower trials required more submovements to fit. Again, the conclusion is that the number of submovements does not necessarily reflect the number of underlying motor impulses, commands, or control policies.

### 2.4 Two-Dimensional Target Perturbation

In the above, we modeled one-dimensional target perturbation in which the target jumped parallel to the movement path to become either closer to or farther away from the hand. We also used a two-dimensional motor plant described in Liu and Todorov (2007) to simulate target jumps perpendicular to the initial movement path. Figure 10 shows example simulations in which the hand started moving horizontally to a target 0.3 m to the right, which then jumped perpendicular to the initial movement path by 0.05 m in either direction. The jump time was 100, 200, or 300 ms after the movement onset. Since in the experiment of Liu and Todorov (2007), subjects did not see their hands during reaching, we set the vision part of the Kalman matrix as before. We averaged movement trajectories over 4000 trials in order to compare them with the reported, average results (Liu & Todorov, 2007). Consistent with experimental observations, the simulated average hand trajectories turned gradually toward the new target locations, with sharper turns for later jumps (see Figure 10a), the average speed profiles became more irregular for later jumps (see Figure 10b), and the movement duration was longer for later jumps (see Figure 10c). Following Liu and Todorov (2007), the terminations of the curves in Figure 10c indicate movement durations, and the termination criterion was the time when average hand velocity was smaller than 0.5 cm/s for 40 ms. (Note that this criterion was for the average velocity across trials and different from the one for individual trials used above.) As we mentioned in section 1, Liu and Todorov (2007) simulated these perturbations with a finite-horizon model and had to preset two movement durations to compute two control policies for each jump time. We used the same motor plant, but because of the infinite-horizon formulation, we were able to apply a single policy for all the simulations in Figure 10 without presetting any movement duration.

## 3 Discussion

In this study, we first extended an infinite-horizon stochastic optimal feedback control model (Phillis, 1985; Jiang et al., 2011; Qian et al., 2013) by incorporating Weber's law of sensory perception. We were able to achieve this by adding state-dependent noise to the observation equation because the model represents the effector state relative to the target. Models that use separate, absolute representations of effector and target in the state vector would need to compare the effector and target states to realize Weber's law. We also considered visual and proprioceptive feedback with different delays and noise levels. We analytically derived an optimal solution for the extended model. We then applied the model to show that terminal vision was more effective in reducing end-point variability than initial vision, and these two partial vision conditions produced end-point accuracies similar to those of the full-vision and no-vision conditions, respectively, and that the differences among the conditions became smaller for slower movements. These simulation results explain Beaubaton and Hay's findings coherently without assuming a switch between crude and fine or between open- and closed-loop strategies. We also applied the same model (and parameters) to simulate target-perturbation experiments for various times of target jump. Consistent with experimental findings (Georgopoulos et al., 1981; Soechting & Lacquaniti, 1983; Flash & Henis, 1991; Heath et al., 1998; Goodale et al., 1986; Pélisson et al., 1986; Fishbach et al., 2005, 2007), the model produced smooth, single-peaked mean velocity profiles when the target jumped early and irregular mean profiles with a plateau or a second peak when the target jumped late. Additionally, the model explained the observations that subjects can perform the task even when they do not see the jump and when their hands are invisible during reaching (Goodale et al., 1986). Moreover, we found that the simulated velocity profiles, including those from the no-jump condition, appeared to have multiple submovements, and the number of submovements was larger for trials that were slower or perturbed later. Importantly, all of these simulations were produced by a single, fixed control policy. The model did not assume different mechanisms at different periods of movements or for different jump times and locations nor require different numbers of policies for movements of different durations. Finally, we simulated two-dimensional target perturbations in which the target jumped perpendicular to the initial movement direction. Here again we used a single policy to simulate different jump times and directions.

Our study demonstrates for the first time that a single control policy can explain a broad range of kinematic features observed in different experimental conditions and trials. This demonstration offers a more parsimonious interpretation of the observations. Thus, ineffective initial visual feedback does not necessarily imply open-loop control at the start of movements. Indeed, in the extreme case of no sensory information about the initial state vector at all, it would be impossible to decide in which direction to move the hand. Likewise, different kinematic features for different perturbation times or different movement durations do not necessarily imply different control strategies, and the number of peaks or decomposed submovements in a velocity profile does not necessarily reflect the number of motor impulses or commands or policies. Instead, our work suggests that variations in experimental conditions might naturally lead to variations in kinematics without changes in underlying control mechanisms or policies. For example, the jump time determines when the large perturbation error occurs during a movement, and that could be sufficient to account for the jump-time dependence of kinematics. Similarly, different trials have different samples of noise sequences, and this fluctuation could be sufficient to explain trial-by-trial variation of kinematics.

Most optimization models for biological motor control use a finite-horizon formulation (Flash & Hogan, 1985; Uno, Kawato, & Suzuki, 1989; Harris & Wolpert, 1998; Todorov & Jordan, 2002; Scott, 2004; Todorov, 2004; Diedrichsen, Shadmehr, & Ivry, 2010). Because they require movement duration to define the cost functions, these models have to be given the duration before solving the optimization problems and initiating movements. To avoid this problem, we and others have recently advocated infinite-horizon formulations (Huh et al., 2010; Jiang et al., 2011; Qian et al., 2013), which can predict movement duration instead of prefixing it. The contrast between finite- and infinite-horizon formulations is particularly striking for modeling target-perturbation experiments. Because target jump changes movement duration, finite-horizon models have to be given two preset durations—one for the original target location and the other for the final target location. Moreover, the second duration depends on when and where the target jumps and the hand state at the time of jump, and finite-horizon models have to be given different second durations for different situations. There is also a model that uses both infinite- and finite-horizon approaches, with the former determining the movement duration and the latter specifying a control policy for that duration (Rigoux & Guigon, 2012). Because the model updates the duration and the corresponding policy at each time step, its number of required finite-horizon policies equals the number of time steps and changes with different movement durations for different conditions. We showed in this letter that our purely infinite-horizon approach provides a simpler explanation of target-perturbation experiments. Our model does not need to be given or compute movement durations explicitly, and it uses a single policy to move the hand toward the target regardless of whether, when, or where the target jumps or whether the jump is visible.

Computational modeling shows only theoretical possibilities. Whether stochastic optimal feedback control models or submovement decomposition models better account for biological motor control is an empirical question. Similarly, only experiments can help differentiate the validity of infinite-horizon models presented here and elsewhere (Huh et al., 2010; Jiang et al., 2011; Qian et al., 2013), the finite-horizon models (Harris & Wolpert, 1998; Todorov & Jordan, 2002; Scott, 2004; Tanaka, Krakauer, & Qian, 2006), and the mixed formulation (Rigoux & Guigon, 2012). We previously discussed the similarities and differences, and the strengths and weaknesses, of finite- and infinite-horizon models and speculated that they both might be employed by biological systems (Qian et al., 2013). Specifically, parsimonious, infinite-horizon policies might provide default control mechanisms, whereas finite-horizon policies might be called for when additional optimization and the associated computational cost are justified. Our study is broadly consistent with this speculation: it shows that an infinite-horizon model with a single policy is able to perform a variety of tasks, but a finite-horizon model with multiple policies might help refine these tasks (e.g., by forcing their completion in a prefixed time frame with less trial-by-trial variation of movement duration).

We already noted that because our model represents the effector state relative to the target, we can implement Weber's law easily by introducing state-dependent noise in the observation equations. Another consequence of this relative, hand-centered representation is that the control signal and the corresponding force vanish as the effector approaches the target even for constant control-law matrix (see equation 2.8). (If force is needed to maintain the effector, one can simply add a component to the state vector or a term in equation 2.8.) In contrast, other models often use a state vector that represents the effector and target separately with respect to a reference frame not centered on the hand (e.g., Todorov, 2005). With such absolute representation, the state vector is not zero even when the effector reaches the target. Consequently, has to vary with time or state and drop to zero to eliminate the control signal (and the force) when the hand reaches the target. Obviously, different choices of the reference frame produce different absolute representations of the effector and target and, consequently, different dependence on time or state. Experimentally measured force gains do change with time or state (Liu & Todorov, 2007; Dimitriou, Wolpert, & Franklin, 2013), although it is unclear which absolute representation they support. These and other findings suggest that the brain may use both absolute and relative state representations (Graziano, 2001; Dimitriou et al., 2013). How to determine the reference frame for the absolute-representation component is an open question for future investigation.

Our explanation of Beaubaton and Hay's (1986) experiments relies on Weber's law and is not identical to that based on the difference between foveola and peripheral vision. The fovea spans only about 1 degree of visual angle, roughly the size of the thumbnail at arm's length. In contrast, hand movements in many experiments and in daily life can often span visual angles of 10 degrees or more so that during most parts of such movements, the eyes can foveate only the hand or the target but not both. Therefore, foveal vision of both hand and target is available only when the hand has almost reached the target.

Our model makes testable predictions. First, in Figures 1 and 2, although toward the end of the movements, the initial- and no-vision conditions were similar and the terminal- and full-vision conditions were similar, the model predicts the opposite before the final convergence on the target: there is greater positional variance in the terminal- and no-vision conditions than in the initial- and full-vision conditions. Second, a closely related prediction is that the time to reach the midpoint of the position trajectory has much larger variance in the terminal- and no-vision conditions than in the initial- and full-vision conditions. Confirmation of these predictions will be further evidence for the effects of visual feedback even at the beginning of movements. Third, although initial feedback is ineffective in reducing final positional variance, it should still be useful for crude aspects of movements such as general direction. The experiment of Bard, Hay, and Fleury (1985) seems to support this prediction. Finally, in a previous paper (Qian et al., 2013) we compared different predictions from the finite- and infinite-horizon control models (e.g., the absence versus presence of trials in which the hand overshoots the target). This study suggests another test that could potentially distinguish between the models. As we mentioned in section 1, the finite-horizon models require two movement durations to compute two control policies for use before and after the target jump. If the target jumps at an unpredictable time or to an unpredictable location, it is impossible to know the second movement duration before the jump. If, in addition, the jump is undetected because of saccadic suppression of vision during the jump (Goodale et al., 1986; Pélisson et al., 1986), then the finite-horizon models would not know when to reset the movement duration or switch the control policy. In contrast, according to our infinite-horizon model, movement duration is not preset but an emergent property that depends on when and where the target jumps regardless of whether the jump moment is detected. Therefore, for an unpredictable and undetected target jump, our infinite-horizon model predicts a change of duration while the finite-horizon models do not.

In summary, we have proposed an infinite-horizon stochastic optimal feedback control model for understanding target-perturbation experiments. Because we incorporated Weber's law, the model also explains differences between initial and terminal visual feedback. Our model can reproduce different kinematic features under various conditions with a single control policy and suggests a parsimonious new interpretation of relevant experimental findings.

## Appendix: Model Details and Derivation

As noted in the text, we extended the formulation of Phillis (1985) and Qian et al. (2013) in two ways. First, we added the term to each feedback equation to implement Weber's law of sensory perception. Second, we included multiple sensory feedbacks with different delays and noise levels to represent, for example, differences between vision and proprioception. In equations 2.1 and 2.2, is the state -vector, is the control -vector, and each is a sensory-feedback -vector. , , , , , , , and are matrices of proper sizes. and are determined by the motor plant and Newtonian mechanics (see the examples that follow). More explanations can be found in Qian et al. (2013). We transformed these equations and the delay equation 2.3 into equations 2.5 and 2.6 by defining the new state and observation (see equation 2.4), which, for two sensory feedbacks, are - and -vectors, respectively. The procedure can be extended to more than two feedbacks. Equations 2.7 and 2.8 are standard estimation and control-law equations, where is also -vector.

To model target jump to a new position, we modified the relevant position component of while keeping other components unchanged. Movement is simulated using the Euler method with time step .

## Acknowledgments

We thank Fangwen Zhai for his help with some early simulations and Li Zhaoping for helpful discussions and kind support. This study was supported by AFOSR grant FA9550-15-1-0439 and the Study of Brain-Inspired Computing System of Tsinghua University program (grant 20141080934).