Two Determinants of Dynamic Adaptive Learning for Magnitudes and Probabilities

Abstract Humans face a dynamic world that requires them to constantly update their knowledge. Each observation should influence their knowledge to a varying degree depending on whether it arises from a stochastic fluctuation or an environmental change. Thus, humans should dynamically adapt their learning rate based on each observation. Although crucial for characterizing the learning process, these dynamic adjustments have only been investigated empirically in magnitude learning. Another important type of learning is probability learning. The latter differs from the former in that individual observations are much less informative and a single one is insufficient to distinguish environmental changes from stochasticity. Do humans dynamically adapt their learning rate for probabilities? What determinants drive their dynamic adjustments in magnitude and probability learning? To answer these questions, we measured the subjects’ learning rate dynamics directly through real-time continuous reports during magnitude and probability learning. We found that subjects dynamically adapt their learning rate in both types of learning. After a change point, they increase their learning rate suddenly for magnitudes and prolongedly for probabilities. Their dynamics are driven differentially by two determinants: change-point probability, the main determinant for magnitudes, and prior uncertainty, the main determinant for probabilities. These results are fully in line with normative theory, both qualitatively and quantitatively. Overall, our findings demonstrate a remarkable human ability for dynamic adaptive learning under uncertainty, and guide studies of the neural mechanisms of learning, highlighting different determinants for magnitudes and probabilities.

The slope of the linear regression indicates by how much the subject's estimate changes on average when the normative estimate changes by one unit.Many studies on human estimates, especially for probability judgments, have observed that the slope of the regression was less than 1 (Costello & Watts, 2014;Erev et al., 1994;Hilbert, 2012;Phillips & Edwards, 1966;Zhu et al., 2020) .Consistent with these studies, we also observed in our study that the slope was less than 1, in both tasks (see Table S2 for descriptive and inferential statistics).In the literature, this phenomenon has been referred to as "conservatism bias" (Costello & Watts, 2014;Erev et al., 1994;Hilbert, 2012;Phillips & Edwards, 1966;Zhu et al., 2020) , because a regression with a slope less than 1 predicts that, for a given level of normative estimate, the subject's estimate will be on average less close to the extremes (0 or 1, hence the 'conservatism' label), i.e. closer to 0.5, than the normative estimate.Here, we do not attach any particular mechanistic interpretation to the slope and treat it as a descriptive measure.For possible explanations of this phenomenon, see (Costello & Watts, 2014;Erev et al., 1994;Hilbert, 2012;Zhu et al., 2020) .Decomposition of the mean squared error.
As presented in the main text, we performed a decomposition of the mean squared error between the subjects' estimates and the normative estimates to quantify the proportion of the error that was attributable to systematic biases in their estimates rather than to their variance (see Results).
We also conducted an additional analysis to investigate the bias: Since we observed a regression slope less than 1 consistent with a "conservatism bias" (Table S2), we investigated the extent to which such a conservatism bias could explain the subjects' bias.Specifically, we quantified the amount of bias explained by a linear regression model fitted to the subjects, which applies a linear transformation to the normative estimates, and models a conservatism bias when its slope is less than 1.We performed a linear regression between the normative estimates and the subjects' estimates averaged across the group, took the predictions of this regression as a model of the biased estimates, and then calculated the mean squared error obtained by replacing the normative estimates with the biased estimates.The proportion of the mean squared error that was reduced by using the biased estimates (i.e. the obtained reduction of the error in proportion to the original error) measures the amount of bias explained by the conservatism bias in subjects.
The full results of the decomposition (proportion of bias, variance, and of conservatism bias) are reported in Table S3 below.
Table S3.Decomposition of mean squared error between subjects' estimates and the normative estimates.MSE: mean squared error.Performance is measured by the accuracy of the estimates, quantified by the mean absolute error between the subject's estimate and the true value of the hidden quantity (the negative of the error was used so that higher values correspond to higher performance).Thin dots and lines connecting them each denote one subject; large circles denote the mean across subjects.

Fig. S2 .
Fig. S2.Dynamics of the normative model's learning rate after a change point, in the magnitude (A) and probability (B) learning tasks.The plots were obtained as in Fig.2, but rather than using the subjects' learning rate, we used the normative model's learning rate instead, which we obtained by running the normative model on the same sequences as the subject.

Fig. S3 .
Fig. S3.Distribution of the number of observations elapsed between two report updates made by subjects.A log scale was used for the number of observations as in (Gallistel et al., 2014) for comparison (the equivalent distribution in Gallistel et al. is shown in their Fig.11).In contrast to(Gallistel et al., 2014), in our study, updates were made on each observation most of the time (84% in the above distribution; mean of the distribution: 1.4 observation).

Fig. S4 .
Fig.S4.The main results are similar and remain significant when excluding all data where the subject did not make an overt update.After excluding all data points where this was the case (i.e.learning rate = 0), we performed the same analyses as in previous figures and obtained the above plots: (A and B) Equivalent to Fig.2 A and B; (C-H) Equivalent to Fig.5 (A-F).Stars denote statistical significance as in the main figures (see legends of those figures for further details).

Fig. S5 .
Fig. S5.The subjects' dynamic adjustments of the learning rate are not explained by learning noise.(A) Model of a delta-rule with learning noise.A noise sample is injected at each update of the model, otherwise governed by a delta-rule with parameter η.Two versions were tested for the noise level: (a) constant (parameter σ ε ), (b) scaled to the magnitude of the prediction error (scaling factor parameter ζ).(B) Values of the model parameters for each subject, for each version of the model and each task.Each dot represents one subject.(C) Results obtained by simulating the model with the subject's parameters on the subject's sequences and performing the same learning rate analyses as those reported in the main results for subjects.Top plots are the results for the analysis corresponding to Fig. 2, bottom plots to Fig. 5.

Fig. S6 .
Fig.S6.Subjects' performance was stable over the course of the task.Performance is measured by the accuracy of the estimates, quantified by the mean absolute error between the subject's estimate and the true value of the hidden quantity (the negative of the error was used so that higher values correspond to higher performance).Thin dots and lines connecting them each denote one subject; large circles denote the mean across subjects.