When subjects adapt their reaching movements in the setting of a systematic force or visual perturbation, generalization of adaptation can be assessed psychophysically in two ways: by testing untrained locations in the work space at the end of adaptation (slow postadaptation generalization) or by determining the influence of an error on the next trial during adaptation (fast trial-by-trial generalization). These two measures of generalization have been widely used in psychophysical studies, but the reason that they might differ has not been addressed explicitly. Our goal was to develop a computational framework for determining when a two-state model is justified by the data and to explore the implications of these two types of generalization for neural representations of movements. We first investigated, for single-target learning, how well standard statistical model selection procedures can discriminate two-process models from single-process models when learning and retention coefficients were systematically varied. We then built a two-state model for multitarget learning and showed that if an adaptation process is indeed two-rate, then the postadaptation generalization approach primarily probes the slow process, whereas the trial-by-trial generalization approach is most informative about the fast process. The fast process, due to its strong sensitivity to trial error, contributes predominantly to trial-by-trial generalization, whereas the strong retention of the slow system contributes predominantly to postadaptation generalization. Thus, when adaptation can be shown to be two-rate, the two measures of generalization may probe different brain representations of movement direction.
Generalization of motor adaptation has been assessed in two different ways. One approach, postadaptation generalization, is to probe how a fully learned local remapping generalizes to other unlearned locations or directions (Imamizu, Uno, & Kawato, 1995; Gandolfo, Mussa-Ivaldi, & Bizzi, 1996; Krakauer, Pine Ghilardi, & Ghez, 2000; Mattar & Ostry, 2007). Another way is to assess adaptation by examining how errors experienced on trial k transfer to another part of the work space in trial k + 1 using a single-state state-space model (trial-by-trial generalization) (Thoroughman & Shadmehr, 2000; Baddeley, Ingram, Miall, 2003; Donchin, Francis, & Shadmehr, 2003; Cheng & Sabes, 2007; Francis, 2008). In both cases, the goal is to assess how a learned remapping generalizes to untrained areas of the work space and thereby determine the nature of the representation used by the brain to encode the learned remapping (Shadmehr, 2004).
This study focuses on directional generalization for multitarget adaptation. We have shown that in the case of eight-target visuomotor rotation adaptation, the two generalization approaches yield similar narrow generalization patterns that do not extend beyond adjacent targets separated by 45 degrees (Tanaka, Sejnowski, Krakauer, 2009). We demonstrated that a population-coding model composed of narrowly tuned computational units successfully explained both forms of generalization and several additional experimental observations, such as the wider pattern of generalization and the slower adaptation speed with an increasing number of training targets. In contrast with these results for rotation adaptation, for force-field adaptation, the two generalization approaches yielded different generalization functions, with trial-by-trial adaptation affecting directions as far as 180 degrees (Thoroughman & Shadmehr, 2000; Donchin et al., 2003), whereas postadaptation generalization was limited to directions within 90 degrees of the training targets (Mattar & Ostry, 2007).
How might these differences in generalization for rotations and viscous force fields be explained? Recently an innovative two-state model of short-term motor adaptation was proposed that posited a fast process with a fast learning rate but weak retention and a slow process with a slow learning rate but strong retention (Smith, Ghazizadeh, & Shadmehr, 2006). A subsequent study demonstrated that it is the slow processes that is retained as a motor memory (Joiner & Smith, 2008). This shows that when initial adaptation is made up of more than one process, these processes may go on to show divergent behavior on subsequent probing and may therefore be separable. In addition, the two-state model, or its multiple-time scale extension, was successfully applied to saccadic gain adaptation (Körding, Tenenbaum, & Shadmehr, 2007; Ethier, Zee, & Shadmehr, 2008). This suggests that an explanation for why rotation adaptation yielded a single generalization function for both approaches whereas force-field adaptation yielded two different generalization functions is that rotation adaptation is one-rate and force-field adaptation is two-rate.
This study had two goals. The first was to investigate the reliability of parameter estimation techniques to determine whether a process is indeed multirate instead of single rate. The second was to show analytically and by computational simulation that if adaptation is indeed two-rate, then the postadaptation approach probes generalization of the slow process, whereas the trial-by-trial approach evaluated with the single-state model probes generalization of the fast process.
2. Statistical Tests for Determination of a State-Space Model's Dimensionality and Parameters
In a recent study, a computational model with fast and slow timescales successfully explained savings, anterograde interference, spontaneous recovery, and rapid unlearning to adaptation in a viscous force field (Smith et al., 2006). One critical question when applying multirate models to experimental data is the degree to which the model's parameters need to differ in order for a standard statistical test to be able to detect multiple processes. We investigated this issue by fitting state-space models to artificially generated error data and performing statistical tests. We first simulated a two-state model and then determined how well standard statistical tests can recover the state-space model's parameters.
Thus, a double-exponential time course of adaptation can occur with a two-state model but not with a single-state model. Therefore, if experimental data exhibit two timescales, a single-state model is incapable of modeling the data and a two-state model is justified. The same result also applies to multitarget training: a single-state model has a single timescale and hence cannot explain a learning curve with more than one time constant.
2.1. Maximum Likelihood Method.
2.2. Prediction Error Estimation Method.
We used the same parameterization of the state-space models as defined above for ML estimation. As in the ML method, we optimized the parameter vectors, computed the FPEs, and selected the state-space model by the FPE criterion.
2.3. Numerical Results.
2.3.1. Dimensionality Estimation.
We created surrogate error data using the two-state state-space and estimated the dimensionality using the two statistical model selection methods described. Several factors affect statistical tests of a state-space model's dimensionality and parameter estimation: noise strength, length of artificial error data, and state-space model parameters. First, if noise strength is substantial, multiple time constants in error data may be misinterpreted as a single time constant process. Second, values of the retention and learning coefficients for the fast and slow processes in the two-state model need to differ enough to be distinguishable from the single-state model. Finally, the number of error trials, N, will determine the balance between the goodness of fit and the number of parameters. To simplify our discussion, we focused on the influence of noise strength and state-space model parameter values by fixing the number of simulated trials to 100.
Figures 2A to 2C show the dimensionality estimated by the ML method and Figures 2D to 2F the PE method. Three general conclusions follow from these simulation results. First, as the noise strengths increased, multiple timescales were lost, and a single-state model with a single time constant was favored, as we expected. Second, retention coefficients (a’s in equation 2.2) needed to differ in order for multiple processes to be identified, whereas the difference between the learning coefficients (b’s in equation 2.2) had little influence on the dimensional estimation. Finally, the ML and PE methods yielded qualitatively similar estimation methods, although the results of the PE method appeared more robust in the setting of noise compared to the ML method.
2.3.2. Parameter Estimation.
We then asked how well the two methods could reproduce the state-space parameters. This time we fixed the state-space parameters (a(s) = 0.99, b(f) = 0.20, r(a) = r(b) = 0.20) to generate artificial data and computed the estimated ratios , . This procedure was iterated 1000 times to obtain estimated distributions of ratios. We used two different noise levels; (1) σ2w = 1 × 10−4 and σ2v = 3 × 10−4 and (2) σ2w = 5 × 10−4 and σ2v = 15 × 10−4.
Figure 3 summarizes the parameter values obtained by the ML method (red histograms in Figures 3A and 3B) and by the PE method (blue histograms in panels A and B). The two estimation methods gave qualitatively similar results, although the ML provided tighter distributions. The estimated distributions of the retention coefficient ratio were almost symmetrical and centered at the true value, whereas the estimated distributions of the learning coefficient ratio were skewed and peaked at the true value. The estimation of the learning coefficients was found to be more precise than that of the retention coefficients in both estimation methods. When the noise strengths were increased (see Figures 3C and 3D), the estimated distributions became broader, but essentially the same trend was observed. In summary, our simulations showed that the ML and PE estimation methods provided similar results for both the dimensionality and the parameter values of state-space models.
2.4. Robustness of the Estimation Methods.
We here examine the robustness of the estimation methods in simulating motor adaptation processes when some simplifying assumptions are violated: (1) process and measurement noises are gaussian, (2) the learning function in response to movement error is linear, and (3) perturbations are constant. Although these assumptions are often made in system identification, they are not strictly valid, so there is a risk of applying an inappropriate state-space model and producing biased estimates in parameter estimation.
We therefore relaxed these assumptions in simulations of the two-state state-space model (see equation 2.1), and tested how well the parameters were estimated. Both the PE and ML methods yielded similar results in these tests, but only the results of ML method will be shown. The covariances of process and measurement noises were fixed at σ2w = 1 × 10−4 and σ2v = 3 × 10−4, respectively, and the state-space model parameters were a(s) = 0.99, b(f) = 0.20, and r(a) = r(b) = 0.20.
We first investigated the effects of having nongaussian processes and measurement noise. We used a Pearson distribution parameterized by the four first cumulants (mean, variance, skewness, and kurtosis). The skewness and kurtosis were systematically varied for both the processes and measurement noise. Highly nongaussian distributions such as bimodal distributions were not considered. The ML estimation method was applied to artificially generated noise data for two levels of skewness, −0.5 and 0.5 (shown in Figure 4A). The results in Figures 4B and 4C confirmed that the parameter estimation was little affected by these moderately skewed distributions. Next, simulated artificial error data were generated (see Figure 4D) at four levels of kurtosis (2.0, 2.5, 3.5, and 4.0). The ML parameter estimation was minimally affected by varying the kurtosis over a wide range of subgaussian (kurtosis smaller than 3) and supergaussian (kurtosis larger than 3) distributions (see Figures 4E and 4F).
Finally, a random perturbation (gaussian perturbations with a mean of 0 and standard deviation of 1) was used to generate artificial error data, to which the ML method was applied. As in the case of constant perturbations, the estimated values for the ratios of retention and learning coefficients were recovered (see Figures 5D and 5E). Estimation of the learning coefficients was more accurate (i.e., had smaller variance), but estimation of the retention coefficients was less accurate (i.e., larger variance) for random perturbation than for constant perturbation.
In summary, the estimation methods were robust even when some of the standard assumptions were violated, giving us more confidence that the fitting procedures can safely be applied to experimental data.
2.5. Reliability of Dimensional Estimation.
To apply our estimation methods reliably to experimental data, we need to assess how often multiple processes are identified from data generated from a single-state model, and vice versa (false positives). We thus generated error data by simulating single- and two-state models with various numbers of trials and applied our dimensional estimation methods.
A range of trial numbers was tested from 60 to 240 in steps of 30 trials (the initial 30 trials were baseline, and the perturbation was turned on thereafter). Estimated dimensions were computed as functions of the number of trials. Dimensional and parameter estimation depends on factors other than the number of trials, such as values of retention and learning coefficients and noise variances. According to Figure 2, the retention coefficients (a) were more influential than the learning coefficients (b), so several values of the retention coefficients were examined while the learning coefficients were fixed. First, the ML estimation method was applied to error data generated with the single-state model (see equation 2.13). The value of the retention coefficient was set to 0.80, 0.90, or 0.95, and the learning coefficient was set to 0.2. Three levels of noise magnitudes were considered. For each number of trials, the ML estimation was applied to 50 independent realizations, and their mean value of estimated dimensions from each run was used as the estimated dimensionality. The results showed a gradually increasing trend of identifying the correct dimensionality (which was one in this case) with the increasing number of trials (see Figure 6). Also, our dimensional estimation method was quite robust against the noise. For noise magnitudes that fall in the range reported experimentally for visuomotor rotation (see Figure 6A), 100 trials should be sufficient to achieve the correct result with a probability of over 90%.
Next, the ML estimation method was applied to error data generated with the two-state model, equations 2.1 and 2.2. The retention coefficient of the slow process (a(s)) was fixed to 0.99, and three values of the ratio r = a(f)/a(s) were examined (0.80, 0.90, and 0.95). The learning coefficients b(f) and b(s) were set to 0.2 and 0.02, respectively. As in the previous simulation, three sets of noise variances were considered. The results showed that as in the case of the single-state model, the probability of correctly estimating the model dimensions was a gradually increasing function of the number of trials, and the difference between time constants of the two processes was relevant in recovering the correct dimensionality (see Figure 7). If the ratio of retention coefficients was 0.8, the estimation method could recover the correct dimensionality within 200 trials. When the number of trials is small or the noises are significant, there is in general a bias toward a single-state model, so the AIC is biased in this case as we expected.
These results suggest a heuristic for estimating the dimensionality from experimental data: acquire error data for at least a few hundred trials and apply the dimensional estimation method to some of the data with an increasing number of trials. If the estimated dimensionality tends to increase with the number of trials, multiple timescales are indicated. If instead the estimated dimensionality decreases with an increasing number of trials, a single-state model is selected.
3. A Two-State State-Space Model for Multitarget Adaptation
The original formulation of two-state model (Smith et al., 2006) was designed to describe single-target adaptation and is not able to describe generalization to multiple targets. Here we extended their model to a general state-space model for directional errors to multiple targets with two states composed of fast and slow adaptation processes.
Here, to reduce the multitarget model to a single-target model, the average over independent runs with randomized target order was assumed. Although this is a standard procedure in machine learning, it is not possible in psychophysical experiments, where fixed pseudo-randomized target order is frequently used. We therefore simulated the two-state model equations 3-1 and 3.2, with a fixed target sequence by adding process noise (σ2w = 1 × 10−4) and measurement noise (σ2v = 3 × 10−4). Each learning curve was noisy, but their averages clearly demonstrated multiple-time behaviors. Therefore, even when a fixed target sequence is used, it is possible to detect multiple timescales if there are enough independent runs with independent noise.
Generalization of the fast and slow processes in principle could be determined by directly fitting the two-state model to the experimental trial-by-trial errors. It was, however, difficult to fit the two-state model reliably due to the large number of parameters. Instead, we used the single-state model approach, described below, to evaluate trial-by-trial generalization.
4. Analysis of Trial-by-Trial Generalization Using a Single-State State-Space Model
5. Trial-by-Trial and Postadaptation Generalization Target the Fast and Slow Adaptation Processes, Respectively
Starting with the premise that learning has fast and slow processes, it is of interest to ask how each of these two processes generalizes. One way to compare generalization functions for the fast and slow processes would be to directly fit a two-state state-space model, equations 3.1 and 3.2, to experimental data and compute the generalization matrices (B(f) and B(s)). This is difficult to do, however, because of the high dimensionality of the state-space model with two states. The full model has 34 parameters (16 for initial conditions; 16 for generalization functions, assuming rotational symmetry; and 2 for the retention factors, assuming A(f) and A(s) to be diagonal with equal diagonal components).
Instead, we show formally that trial-by-trial generalization and postadaptation generalization are characterized by fast and slow processes, respectively. Specifically, using analysis and computational simulations, we show that trial-by-trial generalization of the state-space model with a single state arises mainly from fast learning, whereas the postadaptation generalization function derives from the slow process, provided that certain conditions regarding learning and retention speed are satisfied. This claim can be understood intuitively. Because the fast process has a larger gain than the slow process, the fast process will accordingly make larger corrections to trial errors. Trial-by-trial generalization reflects mainly the change in the fast process, namely, the B(f) matrix in equation 3.1. After completion of adaptation, the fast process will have largely decayed due to weak retention, leaving the residue of the slow process. Thus, postadaptation generalization reflects mainly the net change in the slow process, namely, the B(s) matrix in equation 3.1.
To confirm this analysis, we performed computational simulations with three combinations of narrow, intermediate, and broadly tuned B matrices (see Figure 8). In the following simulations, the retention-factor matrices were assumed to be diagonal, with all diagonal values and , respectively. When parameterizing B, we assumed rotational symmetry, as for the single-state model, so each matrix was described by eight row components. We assumed that the diagonal components of the generalization matrices, which determine trial-by-trial generalization to the same direction, took a maximal value, and that the value of the off-diagonal components decreased as the angular difference between the learned and tested directions increased. The maximal values of the generalization matrices were fixed at 0.12 and 0.03, respectively, for the fast and slow processes. With these parameter values, the statistical tests described in section 2 identify two concurrent processes rather than one. Similar results were also observed when the values for the fast process were larger than those of the slow process and (1 − a(s))−1b(s)>(1 − a(f))−1b(f). Artificial error data were generated using the two-state model with various gaussian widths for and : (A) (σ(f), σ(s)) = (1°, 60°), (B) (σ(f), σ(s)) = (30°, 30°), and (C) (σ(f), σ(s)) = (60°, 1°). Our analysis indicated that the sums over the components of B matrices, equation 4.1, determine how much the fast and slow processes contribute to trial-by-trial and postadaptation generalization, respectively. The values of b(f), b(s), (1 − a(f))−1b(f), and (1 − a(s))−1b(s) computed from the various gaussian widths are summarized in Table 1. The conditions b(f)>b(s) and (1 − a(s))−1b(s)>(1 − a(f))−1b(f) are satisfied for these three sets of gaussian widths.
|.||b(f) .||b(s) .||(1 − a(f))−1b(f) .||(1 − a(s))−1b(s) .|
|(A) (σ(f), σ(s)) = (1°, 60°)||0.12||0.099||2.40||49.95|
|(B) (σ(f), σ(s)) = (30°, 30°)||0.20||0.050||4.012||25.05|
|(C) (σ(f), σ(s)) = (60°, 1°)||0.40||0.030||8.10||15.00|
|.||b(f) .||b(s) .||(1 − a(f))−1b(f) .||(1 − a(s))−1b(s) .|
|(A) (σ(f), σ(s)) = (1°, 60°)||0.12||0.099||2.40||49.95|
|(B) (σ(f), σ(s)) = (30°, 30°)||0.20||0.050||4.012||25.05|
|(C) (σ(f), σ(s)) = (60°, 1°)||0.40||0.030||8.10||15.00|
We first investigated which process contributes more to trial-by-trial generalization. The two-state model was trained with eight targets to produce artificial error data, and trial-by-trial generalization was then assessed using the single-state state-space model. The shapes of the trial-by-trial generalization functions were found to reflect mainly the width of the B matrix for the fast process (the middle row of Figure 8). We next investigated which process contributes most to postadaptation generalization. The two-state model was trained with a single target with the motor perturbation, and the degree of generalization was computed after the training was completed. As predicted, the width of the postadaptation generalization function mainly reflects the slow process (the bottom row of Figure 8). Trial-by-trial and postadaptation generalization have similar shapes only when the fast and slow processes have similar widths for their B matrices.
In summary, we conclude that if motor adaptation has a fast and a slow adaptation process, then trial-by-trial generalization will reflect the fast process and postadaptation generalization the slow process.
A recent study demonstrated that the decrease in relearning rate in A1BA2 paradigms, or anterograde interference, can be explained with a two-state model (Sing & Smith, 2010). Anterograde interference is defined as how the memory of task A1 interferes with the subsequent learning of task B. A two-state model attributes anterograde interference to the memory stored in the slow process, and the degree of anterograde interference increases with the duration of learning of task A1. These predictions from the two-state model were confirmed experimentally.
Our multitarget two-state model posits that the slow process contributes mainly to postadaptation generalization once adaptation is almost complete. The model also makes an interesting prediction: the pattern of postadaptation should reflect that of the fast process if adaptation is prematurely terminated and should approach that of slow process asymptotically. Therefore, in addition to the degree of anterograde interference, a buildup of the slow process should be manifest as a gradual transition in the shape of postadaptation generalization if the fast and slow systems have distinct generalization matrices.
We simulated anterograde interference and postadaptation generalization with various numbers of single-target adaptation trials. We considered a narrow (1°) and a broad (90°) shape of fast and slow process, respectively, and vice versa. In initial 10, 20, 40, and 160 trials, positive perturbations (+1) were imposed, and in subsequent trials, negative perturbations (−1) were delivered. The adaptation rate became slower as the number of adaptation trials increased, suggesting anterograde interference of prior learning of the positive perturbations to subsequent learning of the negative perturbations (see Figure 9A). Accordingly, the shape of postadaptation generalization changed gradually from one mostly reflecting the fast process to another mostly reflecting the slow process (see Figures 9B and 9C).
Starting with the premise that motor adaptation is mediated by slow and fast processes (Smith et al., 2006), we showed analytically and with computational simulations that these two processes lead to double exponential learning curves and are reflected, respectively, in the postadaptation and the trial-by-trial measures of generalization. The question of whether the fast and slow processes had the same or different generalization functions was not addressed in the original two-rate adaptation model. Our formal demonstration suggests that psychophysical investigation using the two generalization measures can probe their corresponding neural representations.
6.1. Visuomotor Rotation Learning.
The generalization function for rotation learning had the same shape whether it was obtained by testing in untrained directions at the end of adaptation or derived from a state-space model with a single state (Tanaka et al., 2009). There are two potential explanations. The first is that rotation adaptation is single-rate. This is certainly the case for one-target rotation adaptation (Zarahn et al., 2008). The second is that multitarget rotation adaptation is two-rate. We found that double exponentials fits were significantly better than single exponentials to eight-target rotation adaptation data in our previous paper (Tanaka et al., 2009), consistent with an initial fast adaptation process followed by a slower one (Smith et al., 2006). This difference between one-target and multitarget learning could be biologically interesting in and of itself, or it could be an artifact of the parameter estimation procedures. Multitarget learning curves were assessed using 264 trials (Tanaka et al., 2009), but in single-target learning, only 80 trials were analyzed (Zarahn et al., 2008). As we noted, the AIC criterion prefers single-state models to two-state models when the number of error trials is insufficient (see equation 2.15 and section 2.3.1). Further experiments with different-sized rotations and different target numbers may help resolve this issue.
6.2. Force-Field Adaptation.
For force-field adaptation, different generalization functions have been described depending on the approach used to derive them. The postadaptation approach to generalization for viscous force-field learning found no generalization beyond 90 degrees from the training direction (Mattar & Ostry, 2007). In contrast, state-space models yielded a broader bimodal pattern of trial-by-trial generalization (Thoroughman & Shadmehr, 2000; Donchin et al., 2003). Our theoretical analysis suggests that this apparent contradiction arises because Mattar and Ostry (2007) probed the slow adaptation process, whereas previous force-field generalization studies studied the fast process. This distinction indicates that in the case of force-field adaptation, fast and slow processes have distinct neural correlates with different directional tuning properties. Thus, force-field adaptation differs from rotation adaptation with respect to both the dimensionality of the state-space model and the widths of the generalization functions. This is in agreement with our previous contention that kinematic and dynamic adaptation are distinct learning processes (Krakauer, Ghilardi, & Ghez, 1999). It is also of interest that recent work has shown that the fast process in force-field adaptation may be mechanistically distinct from the slow process (Smith et al., 2006; Keisler & Shadmehr, 2010). Thus, differences in generalization for the two processes might be less surprising.
Further comment is required, however, with respect to the state-space modeling results for viscous curl force-field learning that have suggested substantial bimodality, with generalization to opposite movement directions reaching nearly 90% if invariant desired trajectories were assumed (Donchin et al., 2003). However, results from the same paper suggest less than 10% generalization to opposite-movement directions if desired trajectories were assumed to be variable. State-space modeling results from Thoroughman and Shadmehr (2000) suggest about 27% generalization to opposite-movement directions. A direct comparison between those results is difficult because these two papers adopted different definitions for their generalization functions, with motor error given in terms of either a scalar or vector. Whether the disparity between these results resulted from this methodological subtlety will require future confirmatory experiments.
6.3. Nonlinear Interactions.
We should emphasize that the state-space framework assumes that adaptation processes are linear time-invariant (LTI). It is interesting that under certain experimental conditions, adaptation paradigms do indeed generate several phenomena that are well predicted by LTI state-space models (Smith et al., 2006; Tanaka et al., 2009). However, adaptation experiments also yield phenomena that are not LTI. For example, we recently showed that savings can occur for rotation adaptation after complete washout (Zarahn et al., 2008), a finding that is incompatible with any LTI multirate model. Similarly, in previous work, we showed that interference can persist in the absence of aftereffects (Krakauer, Ghez, & Ghilardi, 2005). Thus, dissociation of savings and interference effects cannot be explained by current LTI state-space models. One proposed solution is that memories can be protected by contextual cues (Lee & Schweighofer, 2009); modulation by context is by definition nonlinear. Alternatively, qualitatively distinct memory processes may be operating over trials compared to those operating over longer time periods; that is, time and trial may have distinct effects, with only trials corresponding to LTI-based state-space models. We suggest that error-based LTI models capture short-term adaptation processes that are likely cerebellar based (Tseng, Diedrichsen, Krakauer, Shadmehr, & Bastian, 2007; Criscimagna-Hemminger, Bastian, & Shadmehr, 2010; Taylor & Ivry, 2011) and that deviations from LTI models represent the presence of additional learning processes, and not necessarily and error in the models themselves (Huang, Haith, Mazzoni, & Krakauer 2011). It is to be hoped that the structure of behavioral deviation from the predictions of LTI state-space models will be informative and can lead to new experiments and the formulation of new models.
Appendix A: Trial-Based and Time-Based State-Space Models
We thank Eric Zarahn for his help in simulating how the two approaches to the study of generalization map onto the fast and slow adaptation processes, Maurice Smith for critical comments, and two anonymous reviewers for helpful comments. This research was supported by the Howard Hughes Medical Institute (T.J.S.) and NINDS R01 052804 (J.W.K.).