Abstract
A hallmark of adaptation in humans and other animals is our ability to control how we think and behave across different settings. Research has characterized the various forms cognitive control can take—including enhancement of goal-relevant information, suppression of goal-irrelevant information, and overall inhibition of potential responses—and has identified computations and neural circuits that underpin this multitude of control types. Studies have also identified a wide range of situations that elicit adjustments in control allocation (e.g., those eliciting signals indicating an error or increased processing conflict), but the rules governing when a given situation will give rise to a given control adjustment remain poorly understood. Significant progress has recently been made on this front by casting the allocation of control as a decision-making problem. This approach has developed unifying and normative models that prescribe when and how a change in incentives and task demands will result in changes in a given form of control. Despite their successes, these models, and the experiments that have been developed to test them, have yet to face their greatest challenge: deciding how to select among the multiplicity of configurations that control can take at any given time. Here, we will lay out the complexities of the inverse problem inherent to cognitive control allocation, and their close parallels to inverse problems within motor control (e.g., choosing between redundant limb movements). We discuss existing solutions to motor control's inverse problems drawn from optimal control theory, which have proposed that effort costs act to regularize actions and transform motor planning into a well-posed problem. These same principles may help shed light on how our brains optimize over complex control configuration, while providing a new normative perspective on the origins of mental effort.
“There are many paths up the mountain, but the view from the top is always the same”
— Chinese Proverb
INTRODUCTION
Over the past half-century, our understanding of the human brain's capacity for cognitive control has grown tremendously (Friedman & Robbins, 2022; Menon & D'Esposito, 2022; von Bastian et al., 2020; Koch, Poljac, Müller, & Kiesel, 2018; Fortenbaugh, DeGutis, & Esterman, 2017; Abrahamse, Braem, Notebaert, & Verguts, 2016; Westbrook & Braver, 2015; Botvinick & Cohen, 2014). The field has developed consistent ways of defining and operationalizing control, such as in terms of its functions and what distinguishes different degrees of automaticity (Cohen, Servan-Schreiber, & McClelland, 1992; Shiffrin & Schneider, 1977; Posner & Snyder, 1975). It has developed consistent methods for eliciting control and measuring the extent to which control is engaged by a given task (von Bastian et al., 2020; Weichart, Turner, & Sederberg, 2020; Koch et al., 2018; Gonthier, Braver, & Bugg, 2016; Danielmeier & Ullsperger, 2011; Egner, 2007). It has demonstrated how such control engagement varies across individuals (von Bastian et al., 2020; Friedman & Miyake, 2017) and over the life span (Luna, 2009; Braver & Barch, 2002). Finally, research in this area has made substantial progress toward mapping the neural circuitry that underpins the execution of different forms of cognitive control (Friedman & Robbins, 2022; Menon & D'Esposito, 2022; Parro, Dixon, & Christoff, 2018; Shenhav, Botvinick, & Cohen, 2013). The factors that determine how cognitive control is configured have, on the other hand, remained mysterious and heavily debated (Shenhav et al., 2017).
Studies have uncovered reliable antecedents for control adjustments, including the commission of an error (Danielmeier & Ullsperger, 2011; Rabbitt, 1966) or changes in task demands (Gratton, Coles, & Donchin, 1992; Logan & Zbrodoff, 1979). However, it has been a longstanding goal for the field to develop a comprehensive model of how people use the broader array of information they monitor to configure the broader array of control signals they can deploy. To address this question, models have proposed that the problem of determining control allocation can be solved through a general decision-making process that involves weighing the costs and benefits of potential control allocations (Lieder, Shenhav, Musslick, & Griffiths, 2018; Verguts, Vassena, & Silvetti, 2015; Westbrook & Braver, 2015; Shenhav et al., 2013). These models have already shown promise in accounting for how people adjust individual control signals (e.g., how much to adjust attention toward a particular task) based on the incentives and demands of a given task environment (Bustamante, Lieder, Musslick, Shenhav, & Cohen, 2021; Lieder et al., 2018; Musslick, Shenhav, Botvinick, & Cohen, 2015; Verguts et al., 2015). Here, we focus on a different aspect of this problem: How is it that people navigate the multitude of solutions that can match the demands of their environment? How can cognitive control scale to configuring the complex information processing we deploy throughout our daily life? What is the relationship of mental effort to the multiplicity of options for configuring control? Building off well-characterized computational models from motor planning, we examine how multiplicity presents a critical challenge to cognitive control configuration, and how algorithmic principles from motor control can help to overcome these challenges and refine our understanding of goal-directed cognition.
THE MULTIPLICITY OF COGNITIVE CONTROL
To study the mechanisms that govern the allocation of cognitive control, researchers have sought to identify reliable predictors of changes in control allocation within and across experiments. These triggers for control adjustment have in turn provided insight into signals—such as errors and processing conflict—that the brain could monitor to increase or decrease control. Research has shown that control adjustments induced by these signals, even within the same setting, vary not only in degree but also kind (see Table 1).
. | Behavior . | Cognitive Process (DDM) . | Neuroscience . |
---|---|---|---|
Errors | RT ↑ | Threshold ↑ | Motor cortex activation ↓ |
(Danielmeier et al., 2011; King et al., 2010; Jentzsch & Dudschig, 2009; Debener et al., 2005; Gehring & Fencsik, 2001; Rabbitt, 1966) | (Fischer et al., 2018; Dutilh et al., 2012) | (Danielmeier et al., 2011; King et al., 2010) | |
Error Rate ↓ | |||
(Danielmeier et al., 2011; Maier et al., 2011; Marco-Pallarés, Camara, Münte, & Rodríguez-Fornells, 2008; Laming, 1968, 1979) | |||
Interference ↑ | Distractor drift rate ↓ | Target-related activation ↑ | |
(Steinhauser & Andersen, 2019; Maier et al., 2011; King et al., 2010; Ridderinkhof, 2002) | (Fischer et al., 2018) | (Steinhauser & Andersen, 2019; Danielmeier et al., 2011; Maier et al., 2011; King et al., 2010) | |
Distractor-related activation ↓ | |||
(Fischer et al., 2018; Danielmeier et al., 2011; King et al., 2010) | |||
Conflict | RT ↑ | Threshold ↑ | STN activation ↑ |
(Herz, Zavala, Bogacz, & Brown, 2016; Verguts et al., 2011) | (Fontanesi et al., 2019; Herz et al., 2016) | (Frank et al., 2015; Wiecki & Frank, 2013; Ratcliff & Frank, 2012; Cavanagh et al., 2011; Aron, 2007) | |
Interference ↓ | Distractor drift rate ↓ | Target-related activation ↑ | |
(Braem, Verguts, Roggeman, & Notebaert, 2012; Danielmeier et al., 2011; Funes et al., 2010; Kerns, 2006; Ullsperger et al., 2005; Kerns et al., 2004; Gratton et al., 1992) | (Ritz & Shenhav, 2021) | (Egner et al., 2007; Egner & Hirsch, 2005) | |
Incentives | RT ↓ Accuracy ↑ | Threshold ↑ | |
(Frömer et al., 2021; Chiew & Braver, 2016; Ličen et al., 2016; Yee, Krug, Allen, & Braver, 2016; Fröber & Dreisbach, 2014; Soutschek et al., 2014) | (Leng et al., 2021; Dix & Li, 2020; Thurm, Zink, & Li, 2018) | ||
Threshold ↓ | |||
(Leng et al., 2021) | |||
Target effect ↑ | Drift rate ↑ | Target-related activation ↑ | |
(Adkins & Lee, 2021; Krebs et al., 2010) | (Jang et al., 2021; Leng et al., 2021; Dix & Li, 2020) | (Grahek et al., 2021; Etzel et al., 2016; Soutschek et al., 2015) | |
Distractor effect ↓ | Target drift rate ↑ | Distractor-related activation ↓ | |
(Chiew & Braver, 2016; Soutschek et al., 2014; Padmala & Pessoa, 2011) | (Ritz & Shenhav, 2021) | (Padmala & Pessoa, 2011) | |
RT variability ↓ | Accumulation noise ↓ | Sustained task-relevant activation ↑ | |
(Esterman et al., 2014, 2016) | (Ritz et al., 2020; Manohar et al., 2015) | (Esterman et al., 2017) |
. | Behavior . | Cognitive Process (DDM) . | Neuroscience . |
---|---|---|---|
Errors | RT ↑ | Threshold ↑ | Motor cortex activation ↓ |
(Danielmeier et al., 2011; King et al., 2010; Jentzsch & Dudschig, 2009; Debener et al., 2005; Gehring & Fencsik, 2001; Rabbitt, 1966) | (Fischer et al., 2018; Dutilh et al., 2012) | (Danielmeier et al., 2011; King et al., 2010) | |
Error Rate ↓ | |||
(Danielmeier et al., 2011; Maier et al., 2011; Marco-Pallarés, Camara, Münte, & Rodríguez-Fornells, 2008; Laming, 1968, 1979) | |||
Interference ↑ | Distractor drift rate ↓ | Target-related activation ↑ | |
(Steinhauser & Andersen, 2019; Maier et al., 2011; King et al., 2010; Ridderinkhof, 2002) | (Fischer et al., 2018) | (Steinhauser & Andersen, 2019; Danielmeier et al., 2011; Maier et al., 2011; King et al., 2010) | |
Distractor-related activation ↓ | |||
(Fischer et al., 2018; Danielmeier et al., 2011; King et al., 2010) | |||
Conflict | RT ↑ | Threshold ↑ | STN activation ↑ |
(Herz, Zavala, Bogacz, & Brown, 2016; Verguts et al., 2011) | (Fontanesi et al., 2019; Herz et al., 2016) | (Frank et al., 2015; Wiecki & Frank, 2013; Ratcliff & Frank, 2012; Cavanagh et al., 2011; Aron, 2007) | |
Interference ↓ | Distractor drift rate ↓ | Target-related activation ↑ | |
(Braem, Verguts, Roggeman, & Notebaert, 2012; Danielmeier et al., 2011; Funes et al., 2010; Kerns, 2006; Ullsperger et al., 2005; Kerns et al., 2004; Gratton et al., 1992) | (Ritz & Shenhav, 2021) | (Egner et al., 2007; Egner & Hirsch, 2005) | |
Incentives | RT ↓ Accuracy ↑ | Threshold ↑ | |
(Frömer et al., 2021; Chiew & Braver, 2016; Ličen et al., 2016; Yee, Krug, Allen, & Braver, 2016; Fröber & Dreisbach, 2014; Soutschek et al., 2014) | (Leng et al., 2021; Dix & Li, 2020; Thurm, Zink, & Li, 2018) | ||
Threshold ↓ | |||
(Leng et al., 2021) | |||
Target effect ↑ | Drift rate ↑ | Target-related activation ↑ | |
(Adkins & Lee, 2021; Krebs et al., 2010) | (Jang et al., 2021; Leng et al., 2021; Dix & Li, 2020) | (Grahek et al., 2021; Etzel et al., 2016; Soutschek et al., 2015) | |
Distractor effect ↓ | Target drift rate ↑ | Distractor-related activation ↓ | |
(Chiew & Braver, 2016; Soutschek et al., 2014; Padmala & Pessoa, 2011) | (Ritz & Shenhav, 2021) | (Padmala & Pessoa, 2011) | |
RT variability ↓ | Accumulation noise ↓ | Sustained task-relevant activation ↑ | |
(Esterman et al., 2014, 2016) | (Ritz et al., 2020; Manohar et al., 2015) | (Esterman et al., 2017) |
Error-related Control Adjustments
In common cognitive control tasks such as the Stroop, Simon, and Eriksen flanker task (von Bastian et al., 2020; Egner, 2007), participants have prepotent biases that often lead to incorrect responses (e.g., responding based on the salient flanking arrows rather than the goal-relevant central arrow). Errors thus serve a useful indicator that the participant was likely underexerting control and should adjust their control accordingly (Yeung, Botvinick, & Cohen, 2004). The best-studied instantiation of error-related control adjustments manifests in a participant's tendency to respond more slowly and more accurately after an error (Danielmeier & Ullsperger, 2011; Laming, 1979; Rabbitt, 1966), which can be understood as together reflecting post-error adjustments in caution. Indeed, work using models like the drift diffusion model1 (DDM; Ratcliff & McKoon, 2008; Ratcliff, 1978; see Figure 1A), post-error slowing, and post-error increases in accuracy can be jointly accounted for by an increase in one's response threshold, the criterion they set for how much evidence to accumulate about the task stimuli before deciding how to respond (Fischer, Nigbur, Klein, Danielmeier, & Ullsperger, 2018; Dutilh et al., 2012).
Experiments investigating the neural implementation of these post-error adjustments have found that threshold adjustments are associated with the suppression of motor-related activity (Fischer et al., 2018; Danielmeier, Eichele, Forstmann, Tittgemeyer, & Ullsperger, 2011). For instance, Danielmeier et al. (2011) had participants perform a Simon-like task that required them to respond based on the color of an array of dots that were moving in a direction compatible or incompatible with the correct color response. When participants responded incorrectly, they tended to be slower and more accurate on the following trial. This increased caution was coupled with decreased BOLD activity in motor cortex on that subsequent trial, consistent with the possibility that errors led to controlled adjustments of decision threshold (in this case by putatively lowering the baseline activity to require more evidence before responding).
In addition to changing overall caution, errors can also influence how specific stimuli are processed. Studies have shown that error trials can be followed by selective enhancement of task-relevant (target) processing (Steinhauser & Andersen, 2019; Danielmeier et al., 2011, 2015; Maier, Yeung, & Steinhauser, 2011; King, Korb, von Cramon, & Ullsperger, 2010) and/or suppression of task-irrelevant (distractor) processing (Fischer et al., 2018; Danielmeier et al., 2011, 2015). For instance, in the same study by Danielmeier et al. (2011), errors tended to be followed by increased activity in regions encoding the target stimulus dimension and decreased activity in regions encoding the distractor dimension (see also the works of Fischer et al., 2018; King et al., 2010). Thus, whereas post-error slowing effects reflect control over one's decision threshold, such post-error reductions of interference likely reflect a different form of control, one that adjusts the influence of target- and distractor-related information on the evidence that is accumulated before reaching that threshold (target and distraction contributions to the drift rate in the DDM).
Conflict-related Control Adjustments
In addition to error commission, another potential indicator of insufficient control is the presence of processing conflict (Botvinick, Braver, Barch, Carter, & Cohen, 2001; Berlyne, 1957), such as when a person feels simultaneously drawn to respond left (e.g., based on target information) and right (e.g., based on a distractor). One of the best-studied forms of conflict-related control adjustment is the conflict adaptation or congruency sequence effect, which manifests as reduced sensitivity to response (in)congruency after a person has previously performed one or more high-conflict (e.g., incongruent) trials (Jiang & Egner, 2014; Funes, Lupiáñez, & Humphreys, 2010; Egner, Delano, & Hirsch, 2007; Egner & Hirsch, 2005; Gratton et al., 1992). These adaptations are analogous to examples of post-error reductions of interference described above and have the same candidate computational underpinnings in adjustments to the rate of evidence accumulation (Musslick, Cohen, & Shenhav, 2019; Musslick et al., 2015; Kerns et al., 2004). These control adjustments have likewise been found to be associated with changes in task-specific processing pathways (Egner, 2008; Egner, Delano, & Hirsch, 2007). For example, Egner and Hirsch (2005) showed that participants were less sensitive to Stroop incongruence after higher-conflict trials, and that this was coupled with increased activity in the target-associated cortical areas (fusiform face area for face targets).
Another body of work has shown that conflict can trigger changes to response threshold, particularly within a trial, for instance when selecting between two similarly valued options (Fontanesi, Gluth, Spektor, & Rieskamp, 2019; Frank et al., 2015; Wiecki & Frank, 2013; Ratcliff & Frank, 2012; Cavanagh et al., 2011; Verguts, Notebaert, Kunde, & Wühr, 2011; Aron, 2007). These adjustments have been linked to interactions between dorsal anterior cingulate cortex and the subthalamic nucleus (Wessel, Waller, & Greenlee, 2019; Frank et al., 2015; Brittain et al., 2012; Cavanagh et al., 2011; Schroeder et al., 2002). For instance, simultaneous EEG-fMRI has revealed that BOLD in dorsal anterior cingulate cortex and mediofrontal EEG theta power moderates the relationship between decision conflict and adjustments to response threshold (Frank et al., 2015).
Incentive-related Control Adjustments
In addition to signals like error and conflict that reflect dips in performance, the need for control can also be signaled by the presence of performance-based incentives (e.g., monetary rewards for good performance). Incentives can influence overall performance—for instance, often leading participants to perform tasks faster and more accurately across trials (Parro et al., 2018; Yee & Braver, 2018). Incentives can also trigger task-specific adjustments of cognitive control, enhancing the processing of goal-relevant information (Etzel, Cole, Zacks, Kay, & Braver, 2016; Soutschek, Strobach, & Schubert, 2014; Krebs, Boehler, & Woldorff, 2010) and/or suppressing the processing of distractor information (Padmala & Pessoa, 2011), likely reflecting changes in associated drift rates similar to error-related adjustments discussed above (cf. Ritz & Shenhav, 2021, discussed further below). Also similar to error-related findings, there is evidence that incentive-related control adjustments are mediated by changes in processing within stimulus-selective circuits (Hall-McMaster, Muhle-Karbe, Myers, & Stokes, 2019; Esterman, Poole, Liu, & DeGutis, 2017; Etzel et al., 2016; Soutschek, Stelzel, Paschke, Walter, & Schubert, 2015; Padmala & Pessoa, 2011). For example, Padmala and Pessoa (2011) used a Stroop task to show that participants are less sensitive to distractor information when under performance-contingent rewards. They found that this distractor inhibition was mediated by reduced activation in cortical areas sensitive to the distracting stimuli (visual word form area for text distractors).
Performance incentives have been shown to influence not only how well one performs on a given trial but also how consistently they perform within and across trials. When performing sustained attention tasks that require participants to repeat the same response on most trials (e.g., frequent go trials) but respond differently on rare occurrences of a different trial type (e.g., infrequent no-go trials), attentional lapses can manifest as increased variability in response times across trials (Fortenbaugh et al., 2017). When performance is incentivized, participants demonstrate both higher accuracy and lower response time variability (Esterman et al., 2014, 2016, 2017). These performance improvements can be accounted for by assuming that incentives influence control over how noisily evidence is accumulated within each trial (e.g., because of mind-wandering; Ritz, DeGutis, Frank, Esterman, & Shenhav, 2020; Manohar et al., 2015). Neuroimaging studies suggest that enacting the control required to achieve more consistent (less variable) performance is associated with increases in both sustained and evoked responses in domain-general attentional networks and stimulus-specific regions (Esterman et al., 2017).
Multidimensional Configuration of Cognitive Control
Previous research has uncovered a multiplicity of adjustments that occur in response to changes in the demands or incentives for control. Importantly, they show that a monitored signal2 (e.g., an error) can produce several different control adjustments and that a control adjustment (e.g., increased caution) can be elicited by several different monitored signals. Rather than a strict one-to-one relationship between monitored signals and control adjustments, this diversity suggests that participants make simultaneous decisions across multiple control effectors.
This control multiplicity is evident in studies of post-error adjustments discussed above (Danielmeier & Ullsperger, 2011), in which errors can result in both increased caution (i.e., more conservative response thresholds) and a change in attentional focus to favor target over distractor information (putatively underpinned by adjustments in drift rate). Experiments have found that both adjustments appear to occur simultaneously (Fischer et al., 2018; Danielmeier et al., 2011, 2015; King et al., 2010), reflecting a multifaceted response to the error event.
In a recent experiment, we showed that people can also exert independent control over their processing of targets and distractors (Ritz & Shenhav, 2021). Like Danielmeier et al. (2011), participants responded to a random dot kinematogram based on dot color, while ignoring dot motion. Across trials, we parametrically varied both the target coherence (how easily the correct color could be identified) and distractor interference (how coherently dots were moving in the same or opposite direction as the target response). We found that participants exerted control over their processing of both target and distractor information, but that they did so independently and differentially depending on the relevant task demands. Under performance incentives, participants preferentially enhanced their target sensitivity, whereas after high-conflict trials, participants preferentially suppressed their distractor sensitivity (and, to a lesser extent, also enhanced target sensitivity). A similar pattern has been observed at the neural level while participants perform a Stroop task (Soutschek et al., 2015). Whereas performance incentives preferentially enhanced sensitivity in target-related areas (visual word form area for text targets), conflict expectations preferentially suppressed sensitivity in distractor-related areas (fusiform face area for face distractors). These findings demonstrate that control can be flexibly reconfigured across multiple independent control signals to address relevant incentives and task demands.
There is also evidence that different people prioritize different control strategies within the same setting. For instance, Boksem, Meijman, and Lorist (2006) had participants perform the Simon task over an extended experimental session and observed performance fatigue in the form of slower and less accurate responding over time. Toward the end of the session, the experimenters introduced monetary incentives and found that this counteracted the effects of fatigue, but did so heterogeneously across the group. When making an error during this incentivized period, some participants responded by focusing more on responding quickly, while others focused on responding accurately. The engagement of these differential control strategies was associated with changes in distinct ERPs (error-related negativity vs. contingent negative variation). Similar variability in reliance on different control strategies has been seen across the life span (Ritz et al., 2020; Fortenbaugh et al., 2015; Luna, 2009; Braver & Barch, 2002) and between clinical and healthy populations (Grahek, Shenhav, Musslick, Krebs, & Koster, 2019; Lesh, Niendam, Minzenberg, & Carter, 2011; Casey et al., 2007).
Collectively, previous research suggests that there is a many-to-many mapping between the information that participants monitor related to task demands, performance, and incentives, and the multitude of control signals that participants can deploy. Recent theoretical models have explained this heterogeneity in terms of the flexible deployment of control, proposing that there is an intervening decision process that integrates monitored information, determining which strategies to engage, and to what extent, based on the current situation (Lieder et al., 2018; Verguts et al., 2015; Shenhav et al., 2013).
SELECTION AND CONFIGURATION OF MULTIVARIATE CONTROL
Casting control allocation as a decision process provides a path toward addressing how people integrate information from their environment to select the optimal control allocation. This process of optimization entails finding the best solution for an objective function and set of constraints. Objective functions define the costs and benefits of different solutions, whereas soft constraints (e.g., costs) and hard constraints (e.g., boundary conditions) limit the space of possible solutions. Optimization has long played a central and productive role in building computational accounts of multivariate planning in the domain of motor control (Shadmehr & Ahmed, 2020; Wolpert & Landy, 2012; Todorov & Jordan, 2002; Uno, Kawato, & Suzuki, 1989; Flash & Hogan, 1985), suggesting that this research into how the brain coordinates actions may offer general principles for how the brain coordinates cognition.
The starting point for solving any optimization problem is identifying the objective function. Researchers in decision-making and motor control have suggested that participants maximize the amount of reward harvested per unit time (reward rate; Manohar et al., 2015; Shadmehr, Orban de Xivry, Xu-Wilson, & Shih, 2010; Niv, Daw, Joel, & Dayan, 2007; Harris & Wolpert, 2006). Studies have found that people's motor actions are sensitive to incentives, with faster and/or more accurate movement during periods when they can earn more rewards (Adkins, Lewis, & Lee, 2022; Codol, Forgaard, Galea, & Gribble, 2021; Sukumar, Shadmehr, & Ahmed, 2021; Codol, Holland, Manohar, & Galea, 2020; Yoon, Jaleel, Ahmed, & Shadmehr, 2020; Manohar, Muhammed, Fallon, & Husain, 2019; Manohar, Finzi, Drew, & Husain, 2017; Manohar et al., 2015; Pekny, Izawa, & Shadmehr, 2015; Trommershäuser, Maloney, & Landy, 2003a, 2003b). For example, participants will saccade toward a target location more quickly and more precisely on trials that are worth more money (Manohar et al., 2015, 2017, 2019). Responding faster and more accurately breaks the traditional speed-accuracy trade-off (Manohar et al., 2015; Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006) and is thought to reflect the use of control to optimize both reward and duration (Shadmehr & Ahmed, 2020).
It has been similarly proposed that a core objective of cognitive control allocation is also the maximization of reward rate (Lieder et al., 2018; Boureau, Sokol-Hessner, & Daw, 2015; Manohar et al., 2015; Shenhav et al., 2013; Bogacz et al., 2006). That is, that people select how much and what kinds of control to engage at a given time based on how control will maximize expected payoff (e.g., performance-based incentives like money or social capital) while minimizing the time it takes to achieve that payoff. Consistent with this proposal, studies have shown that people configure information processing (e.g., adjust their response thresholds) in ways that maximize reward rate (Balci et al., 2011; Starns & Ratcliff, 2010; Simen et al., 2009) and that they adjust this configuration over time based on local fluctuations in reward rate (Otto & Daw, 2019; Guitart-Masip, Beierholm, Dolan, Duzel, & Dayan, 2011).
We recently used a reward-rate optimization framework to make model-based predictions for how people coordinate multiple types of control (Leng, Yee, Ritz, & Shenhav, 2021). Participants performed a Stroop task that was self-paced, enabling them to dynamically adjust at least two forms of control: their overall drift rate (governing both how fast and accurate they are) and their response threshold (governing the extent to which they trade off speed for accuracy; Figure 1A). We varied the amount of money participants could gain with each correct response and the amount they could lose with each incorrect response. Participants could increase their response threshold to guarantee that every response was correct, but this came at the cost of completing fewer trials and therefore earning fewer rewards over the course of the experiment. Increasing their drift rate can achieve higher reward rates, but is subject to effort costs, which we will return to later. The reward-rate optimal configuration across both drift and threshold would be to increase drift rate and decrease threshold for larger rewards and increase thresholds for larger penalties (Figure 1B). Critically, we found that participants' DDM configuration matched the predictions of this optimal model (Figure 1C). These results provide evidence that participants' performance can align with the optimal joint configuration across multiple control parameters.
These studies validate the proposal that control allocation can be framed as decision-making over multidimensional configurations of control (i.e., combination of different control types engaged to different degrees) and that these decisions seek to optimize an objective function such as expected reward rate. DDM is useful for studying these configuration processes, as it provides a well-defined cognitive process model with criteria for good performance. Similar optimality analyses have also been performed in domains like working memory (Sims, 2015; Sims, Jacobs, & Knill, 2012), demonstrating the generality of this approach. However, for all the algorithmic tools it provides, this decision-making framework also presents an entirely new set of challenges. Most notably, the many possible control configurations to choose from often means that there will be multiple equivalent solutions to this decision. Here, again, valuable insights can be gained from research on motor control, where these challenges and their potential solutions have been extensively explored.
INVERSE PROBLEMS IN MOTOR AND COGNITIVE CONTROL
Inverse Problems in Motor Control
Some of the most influential computational modeling of motor planning was founded at the Central Labor Institute in Moscow in the early 20th century. This group formalized for the first time a fundamental problem for motor control: How does the motor system choose among the many similar actions that could be taken to achieve a goal (Whiting, 1983; Bernstein, 1935/1967)? This problem is centered around the fact that motor control is inherently ill-posed, with more degrees of freedom in the body (e.g., joints) than in the task space, increasing the inherent challenge of selecting the best motor action among many equivalent options.
These motor redundancies can occur in several domains of motor planning (Kawato, Maeda, Uno, & Suzuki, 1990). At the task level, there may be many trajectories through the task space that achieve the same goals, such as the paths a hand could take on its way to picking up a cup (Task Degeneracy; Figure 2A). At the effector level, there are often more degrees of freedom in the skeletomotor system than in the task space, creating an “inverse kinematics” problem for mapping from goals on to actions (Effector Degeneracy; Figure 2B). For example, there are many ways you could move your arm to trace a line with the tip of your finger. A related problem arises when there is redundancy across effectors, such as in agonist and antagonistic muscles (Effector Antagonism; Figure 2C). Because of their opponency, the same action can occur by trading off the contraction of one muscle against the relaxation of the other. These inverse problems have been a major challenge for theoretical motor control and to the extent that a similar problem occurs in cognitive control, solutions from the motor domain may help guide our understanding of ill-posed cognitive control.
Inverse Problems in Cognitive Control: The Algorithmic Level
Considering the massive degrees of freedom that exist in neural information processing systems, cognitive control is a prime candidate for inverse problems of its own. To illustrate this, we can return to the example of how people decide to allocate control across parameters of the DDM (Figure 2D–F). As reviewed above, participants can separately control individual parameters of evidence accumulation, specifically drift rate (Bond, Dunovan, Porter, Rubin, & Verstynen, 2021; Ritz & Shenhav, 2021), threshold (Fischer et al., 2018; Cavanagh & Frank, 2014), and accumulation noise (Mukherjee, Lam, Wimmer, & Halassa, 2021; Ritz et al., 2020; Nakajima, Schmitt, & Halassa, 2019). This test case of finding a reward-rate optimal configuration of DDM parameters faces the same set of challenges as those outlined above from motor control.
First, just as there are many hand trajectories that can produce a desired outcome, there are also many ways to produce good decision-making performance (Figure 2D). Different combinations of accuracy (numerator) and RT (denominator) can trade off to produce the same reward rate. This creates an equivalence in the task space between different performance outcomes with regard to the goals of the system.
Second, just as there are more degrees of freedom in the arm than in many motor tasks, there is more flexibility in information processing than in many cognitive tasks. For example, the same patterns of behavior (and therefore expected reward rates) can result from different configurations of DDM parameters (Bogacz et al., 2006; Figure 2E). From a model-fitting perspective, this forces researchers to limit the parameters they attempt to infer from behavior, fixing at least one parameter value (often accumulation noise), while estimating the others (Bogacz et al., 2006; Ratcliff & Rouder, 1998). This degeneracy similarly limits a person's ability to perform the “mental model-fitting” required to optimize across all these control configurations when deciding how to allocate control. These difficulties are exacerbated in more biologically plausible models of evidence accumulation like the leaky competing accumulator (Usher & McClelland, 2001), which introduce additional parameters (e.g., related to memory decay and levels of inhibition across competing response units), resulting in even greater parameter degeneracy (Miletić, Turner, Forstmann, & van Maanen, 2017). A similar trade-off exists in the classic debate between early and late attentional selection, namely, whether attention operates closer to sensation or closer to response selection (Driver, 2001). Given that attention appears to operate at multiple processing stages (Lavie, Hirst, de Fockert, & Viding, 2004), degeneracies will arise in conditions under which early and late attentional control produce similar changes in task performance.
Third, just as there is antagonism across motor effectors, there is also antagonism across cognitive processes. That is, even when the algorithmic goal is clear, there are degenerate control signals that can achieve this goal. For instance, in typical interference-based paradigms (e.g., flanker or Stroop), participants must respond to one element of a stimulus while ignoring information that is irrelevant and/or distracting. To increase the overall rate of accumulation of goal-related information, a person can engage two different forms of attentional control: enhance targets or suppress distractors. Utilizing either of these strategies will improve performance, meaning that the cognitive controller could trade off enhancing targets or suppressing distractors to reach the same level of performance (Figure 2F). Recent work has shown that target and distractor processing can be controlled independently in conflict tasks (Adkins et al., 2022; Ritz & Shenhav, 2021; Evans & Servant, 2020), creating an ill-posed problem of coordinating across these strategies.
Inverse Problems in Cognitive Control: The Implementational Level
Optimally configuring a decision processes is difficult, facing several challenges that are similar to those that occur when planning a motor action. In the case of algorithmic cognitive models, parameter degeneracy (e.g., DDM) and process degeneracy (e.g., target-distractor trade-off) make it difficult to optimally configure information processing. However, problems at this level of analysis reflect the best-case scenario, as these cognitive models are themselves often intended to be lower-dimensional representations of the underlying neural processes (Bogacz, 2007). At the implementational level, cognitive control occurs over the complex neural instantiation of these algorithms, further exacerbating the ill-posed nature of the control problem.
One domain in which there can be redundancy in neural control is at the stage of processing at which control is applied, mirroring debates about early and late attentional selection highlighted above. Previous work has suggested that control can influence “early” sensory processing (Adam & Serences, 2021; Egner & Hirsch, 2005) and “late” processing in PFC (Mante, Sussillo, Shenoy, & Newsome, 2013; Stokes et al., 2013). To the extent that interventions along processing pathways have a similar influence on performance for a given task, there is a dilemma of where to allocate control.
The difficulty in deciding “where” to allocate control is magnified as the control targets move from macroscale processing pathways to local configurations of neural populations. For example, a controller could need to configure a small neural network to produce a specific spiking profile in response to inputs. Confounding this goal, it has been shown that a broad range of cellular and synaptic parameters produce very similar neuron- and network-level dynamics at the scale of only a few units (Goaillard & Marder, 2021; Alonso & Marder, 2019; Marder & Goaillard, 2006; Prinz, Bucher, & Marder, 2004). For example, very different configuration of sodium and potassium conductance can produce very similar bursting profiles (Golowasch, Goldman, Abbott, & Marder, 2002), analogously to the redundancy of antagonistic muscles. These findings demonstrate that even simple neural networks face an ill-posed configuration problem, highlighting additional challenges to the biological implementation of cognitive control. Despite this degeneracy, research on brain–computer interfaces has shown that animals can exert fine-grained control over neural populations. Animals are capable of evoking arbitrary activity patterns to maximize reward (Athalye, Carmena, & Costa, 2019), even at the level of controlling single neurons (Patel, Katz, Kalia, Popovic, & Valiante, 2021; Prsa, Galiñanes, & Huber, 2017).
Across these different scales of implementation, the optimization of neural systems faces a core set of inverse problems: There are many macroscale configurations that map similarly onto task goals, and there are many microscale configurations that map similarly on to local dynamics. This problem is closely related to the long-debated issue of multiple realizability in philosophy of science, which, in its applications to neuroscience, has explored the lack of one-to-one mapping between neural and mental phenomena (e.g., whether pain is identical to “C fiber” activity; Putnam, 1967). The lack of one-to-one mappings between structure and function poses not only an inferential problem to scientists and philosophers but also an optimization problem to a brain's control system.
The Problem with Inversion
As we've outlined above, the core difficulty in specifying cognitive control signals comes from situations in which the brain needs to map a higher-dimensional control configuration on to a lower-dimensional task space, particularly when there is redundancy in this mapping (Figure 3). This class of problems has been extensively explored in applied mathematics (Willcox, Ghattas, & Heimbach, 2021; Calvetti & Somersalo, 2018; Evans & Stark, 2002; Engl, Hanke, & Neubauer, 1996), and this field has developed helpful formalisms and solutions to the problems faced by the brain. We can first consider the forward problem, where a brain forecasts what would happen if it adopted a specific control configuration. For example, the controller may predict how performance will change if it raises its decision threshold. This problem generally has a unique solution, as a specific configuration will usually produce a specific result even if there is redundancy. Furthermore, projecting from a higher-dimensional configuration to a lower-dimensional outcome will compress the output, resulting in a stable solution.
However, the goal in optimization is to solve the inverse problem, in this case inferring which control configurations will produce a desired task state. As discussed earlier, this problem is generally ill-posed (Hadamard, 1902) because there are multiple redundant solutions for implementing cognitive control. Another reason this is an ill-posed problem is that this projects a lower-dimensional outcome into a higher-dimensional configuration (Calvetti & Somersalo, 2018; Engl et al., 1996). For example, the controller may optimize reward rate, but to do so must configure many potential neural targets. Because outcomes are noisy (e.g., noisy estimates of values due to sampling error or imperfect forecasting), projection into a higher dimensional control space will amplify this noise. In this regime, small changes in values or goals can produce dramatically different control configurations, leading to an unstable optimization process. Without compensatory measures, these features of ill-posed cognitive control would impede the brain's ability to effectively achieve goals.
This fundamental challenge of inferring the actions that will achieve goals has long been a central one within research on computational motor control (McNamee & Wolpert, 2019). Thankfully, these inverse problems can be made tractable although well-established modifications to the optimization process (Engl et al., 1996; Tikhonov, 1963). Motor theorists have leveraged these solutions to help explain action planning, and in doing so providing insight into the nature of effort costs.
SOLVING THE INVERSE PROBLEM
Motor Solutions to the Inverse Problem
A major innovation in theoretical motor control was to reframe the motor control problem as an optimization problem. Under this perspective, actions optimize an objective function over the duration of the motor action (similarly to the reward rate used for decision optimization). For scientists who took this approach, a primary focus was to understand people's objective functions and, in particular, the costs that constrain people's actions. Researchers proposed that people place a cost on jerky movements (Flash & Hogan, 1985), muscle force (Uno et al., 1989; Nelson, 1983; Chow & Jacobson, 1971), or action-dependent noise (Harris & Wolpert, 1998), and therefore try to minimize one or more of these while pursuing their goals. A core difference between these accounts was whether costs depended on movement trajectories (Flash & Hogan, 1985) or muscle force (Uno et al., 1989), with the latter better explaining bodily constraints on actions (e.g., because of range of movement).
It now appears that actions are constrained by a muscle-force-dependent cost (Morel, Ulbrich, & Gail, 2017; Diedrichsen, Shadmehr, & Ivry, 2010; O'Sullivan, Burdet, & Diedrichsen, 2009; Uno et al., 1989) and likely also endpoint noise (O'Sullivan et al., 2009; Todorov, 2005; Harris & Wolpert, 1998). However, it remains unclear whether these effort costs are because of physiological factors like metabolism, or whether they reflect a more general property of the decision process. Although metabolism would be an obvious candidate for these effort costs, researchers have found that subjective effort appraisals are largely uncorrelated with information being signaled by bodily afferents (Marcora, 2009). Furthermore, whereas metabolic demands should increase linearly with muscle force (Szentesi, Zaremba, van Mechelen, & Stienen, 2001), effort costs are better accounted for by a quadratic relationship (Shadmehr & Ahmed, 2020; Diedrichsen, Shadmehr, & Ivry, 2010).
These discrepancies suggest that motor effort may not depend solely on energy expenditure but also on properties of the optimization process (e.g., related to the anticipated control investment). A promising explanation for these effort costs may arise from the solution to motor control's ill-posed inverse problem. A central method for solving ill-posed problems is to constrain the solution space through regularization (i.e., placing costs on higher intensities of muscle force), a role that motor control theorists have proposed for effort costs (Kawato et al., 1990; Jordan, 1989). For example, across all motor plans that would produce equivalent performance outcomes, there is only one solution that also expends the least effort. From this perspective, motor effort enables better planning by creating global solutions to degenerate planning problems.
Regularization as a Solution to Ill-posed Cognitive Control Selection
Much like motor control, cognitive control must also solve a degenerate inverse problem. Like motor control, cognitive control is subjectively costly (McGuire & Botvinick, 2010; Kahneman, 1973). For example, participants will forego money (Westbrook, Kester, & Braver, 2013) and even accept pain (Vogel, Savelson, Otto, & Roy, 2020) to avoid more cognitively demanding tasks. If physical effort regularizes degenerate motor planning, then it is plausible that cognitive effort similarly regularizes degenerate cognitive planning. Recasting physical and mental effort as a regularization cost brings these domains in line with a wide range of related psychological phenomena. For example, inferring depth from visual inputs is also an ill-posed problem, and this inference has been argued to depend on regularization (Bertero, Poggio, & Torre, 1988; Poggio, Koch, & Brenner, 1985; Poggio, Torre, & Koch, 1985).
Recent proposals have drawn connections between cognitive effort and regularization under a variety of theoretical motivations. For instance, it has been proposed that cognitive effort enhances multitask learning (Musslick, Saxe, Hoskin, Reichman, & Cohen, 2020; Kool & Botvinick, 2018), where effort costs regularize toward task-general policies (“habits”) that enable better transfer learning. It has been also been proposed, based on principles of efficient coding (Zénon, Solopchuk, & Pezzulo, 2019), that effort costs enable compressed and more metabolically efficient stimulus-action representations. Finally, effort costs have been motivated from the perspective of model-based control (Piray & Daw, 2021), where regularization toward a default policy allows for more efficient long-range planning. These accounts offer different perspectives on the benefits of regularized control, complementing motor control's emphasis on solving ill-posed inverse problems.
Regularization in inverse problems has a normative Bayesian interpretation, in which constraints come from prior knowledge about the solution space (Calvetti & Somersalo, 2018). This Bayesian perspective has been influential in modeling ill-posed problems like inferring knowledge from limited exemplars (Tenenbaum, Kemp, Griffiths, & Goodman, 2011, Tenenbaum, Griffiths, & Kemp, 2006) and planning sequential actions (Botvinick & Toussaint, 2012; Friston, Samothrakis, & Montague, 2012; Solway & Botvinick, 2012). Regularization and Bayesian inference have been a productive approach for understanding how people solve ill-posed problems in cognition and action. Within the Bayesian frameworks, effort costs can be recast in terms of shrinkage toward a prior, providing further insight into how a regularization perspective could inform cognitive control. If there are priors on cognitive or neural configurations, such as automatic processes like habits, then regularized control would penalize deviations from those defaults.
A Bayesian perspective on the relationship between automaticity and control costs makes an interesting and counterintuitive prediction: When people's priors are to exert high levels of control, they will find it difficult to relax their control intensity. Research on control learning supports these predictions. A large body of work has found that participants learn to exert more control when they expect a task to be difficult (Jiang, Beck, Heller, & Egner, 2015; Bugg & Chanani, 2011; Yu, Dayan, & Cohen, 2009; Logan & Zbrodoff, 1979) or when stimuli are associated with conflict (Bugg & Hutchison, 2013; Bugg & Crump, 2012). This results in an allocation of excessive and maladaptive levels of control when a trial turns out to be easy (Logan & Zbrodoff, 1979). A recent experiment by Bustamante et al. (2021) extended these findings by showing how biases in control exertion can emerge through feature-specific reward learning. Participants performed a color-word Stroop task where they could choose to either name the color (more control-demanding) or read the word (less control-demanding). They learned that certain stimulus features would yield greater reward for color-naming and other features would yield greater reward for word-reading. Critically, during a subsequent transfer phase, participants had trouble learning to adaptively disengage control when faced with a combination of stimulus features that had each previously predicted greater reward for greater effort. That is, they had learned to overexert control. It remains to be determined whether this overexertion is because of effort mobilization, or control priors that make color-naming less effortful (Athalye et al., 2019; Yu et al., 2009).
This work highlights connections between control theory and forms of reinforcement learning that have been well-characterized within the cognitive sciences, whereby an agent is presumed to select actions (or sequences of actions) that maximize their expected long-term reward (Collins, 2019; Neftci & Averbeck, 2019; Sutton & Barto, 2018). Indeed, the parallels between these two modeling frameworks are rich, most notably in that both seek to optimize goal-directed behavior by optimizing the Bellman equation (a formula for estimating an action's expected future payoff; Anderson & Moore, 2007; Kalman, 1960). Ways in which these traditions often differ is that control theory traditionally emphasizes prospective model-based planning of a feedback policy over a continuous state space, whereas reinforcement learning usually focuses on gradually learning an action policy over a discrete state space (Recht, 2018). Reinforcement learning could speculatively intersect with cognitive control by learning the control priors highlighted above (complementing use-based automaticity; Miller, Shenhav, & Ludvig, 2019) and evolutionary priors; Cisek, 2019; Zador, 2019), or could be involved in learning higher-level control policies (e.g., learning a sequence of subgoals; Frank & Badre, 2012).
ALGORITHMS FOR MOTOR AND COGNITIVE CONTROL
Motor and cognitive control appear to solve similar problems (action-outcome inversion) and plausibly through similar computational principles (regularized optimization). The next logical step is to ask whether cognitive control has developed similar algorithmic solutions to this inversion as the motor control system. A longstanding gold-standard algorithm for modeling motor actions is the linear quadratic regulator (LQR), which plays a central role in the optimal feedback control theory of motor planning (Haar & Donchin, 2020; Shadmehr & Krakauer, 2008; Todorov & Jordan, 2002). Building off the success of optimal feedback control in the motor domain, this algorithm provides a promising candidate for understanding the planning and execution of cognitive actions.
LQR can provide the optimal solution to sequential control problems when two specific criteria are met. First, the system under control must have linear dynamics, such as a cruise controller that adjusts the speed of a car. Second, the control process must be optimizing a quadratic objective function. This usually involves minimizing both the squared goal error (e.g., the squared deviation from desired speed) and the squared control intensity (e.g., the squared motor torque). Under these conditions, LQR provides an analytic (i.e., closed-form) solution to the optimal policy,3 avoiding the curse of dimensionality (Van Rooij, 2008). LQR is equivalent to the Kalman filtering method for optimal inference (Todorov, 2008; Kalman & Bucy, 1961), and the linear quadratic Gaussian algorithm combines inference and control for computationally tractable optimal behavior under state uncertainty (Yeo, Franklin, & Wolpert, 2016; Todorov, 2005).
In the domain of motor control, LQR empirically captures participants' motor trajectories (Yeo et al., 2016; Stevenson, Fernandes, Vilares, Wei, & Kording, 2009; Todorov & Jordan, 2002), particularly in the case where there are mid-trajectory perturbations to goals or effectors (Takei, Lomber, Cook, & Scott, 2021; Nashed, Crevecoeur, & Scott, 2012; Knill, Bondada, & Chhabra, 2011; Diedrichsen, 2007; Liu & Todorov, 2007). A striking example of the power of this model to capture behavior was observed in an experiment on motor coordination (Diedrichsen, 2007). Participants performed a reaching task in which the goal either depended on both arms (e.g., rowing), or where each arm had a separate goal (e.g., juggling). During the reach, the experimenters perturbed one of the arms and found that participants compensated with both arms only when they were both involved in the same goal. In LQR, this goal-dependent coordination arises because of the algorithm's model-based feedback control, with squared effort costs favoring distributing the work across goal-relevant effectors. Accordingly, this study found that LQR simulations accurately captured participants' reach trajectories. Furthermore, participants' behavior also confirmed a key prediction of LQR, namely, that noise correlations between arms will be task-specific, constraining control to the goal-relevant dimensions of the task manifold (the “minimal intervention principle”; Todorov & Jordan, 2002).
A starting point for developing algorithmic links between cognitive and motor control is to consider whether cognitive control is a problem that is well-suited for LQR. The first prediction from LQR is that the dynamics between cognitive states are approximately linear. One measure of these dynamics comes from task switching, in which participants switch between multiple stimulus-response rules (“task sets”; Monsell, 2003). Researchers have found that these transitions between task sets are well-captured by linear dynamics (Musslick & Cohen, 2021; Musslick, Bizyaeva, Agaron, Leonard, & Cohen, 2019; Steyvers, Hawkins, Karayanidis, & Brown, 2019). For example, when participants are given a variable amount of time to prepare for a transition between two tasks (e.g., responding based on letters vs digits), the stereotypical switch cost of slower responding after a task switch compared to a task repetition decreases with greater preparation time (Rogers & Monsell, 1995). A simple re-analysis of this pattern shows that switch costs can be well-captured by a linear dynamical model (Figure 4A). Whereas switching to the “letter” or “digit” task had different initial and asymptotic performance costs, they appear to exhibit a similar rate of change.
Linear dynamics have also been observed in attentional adjustments that occur within a trial of a given task. For instance, recent work has shown that performance on an Eriksen flanker task can be accounted for by a DDM variant in which initially broad attention narrows within a trial to primarily focus only the central target, resulting in a shift from the drift rate being initially dominated by the flankers to being primarily dominated by the target (Weichart et al., 2020; Servant, Montagnini, & Burle, 2014; White, Ratcliff, & Starns, 2011). Using the dot motion task described earlier, we recently showed that these within-trial dynamics can be further teased apart into target-enhancing and distractor-suppressing elements of feature-based attention, each with its own independent dynamics (Ritz & Shenhav, 2021). These dynamics were well-captured by an accumulation model that regulated feature gains with a linear feedback control law (Figure 4B).
A second prediction from LQR is that cognitive effort costs are quadratic. There are two lines of evidence that support this prediction. One line of evidence comes from studies of cognitive effort discounting, which examine how people explicitly trade off different amounts of reward (e.g., money) against different levels of cognitive effort (e.g., n-back load). These studies quantify the extent to which different levels of effort are treated as a cost when making those decisions (i.e., how much reward is discounted by this effort), and many of them find that quadratic effort discounting captures choice the best among their tested models4 (Figure 4C; Petitet, Attaallah, Manohar, & Husain, 2021; Massar, Pu, Chen, & Chee, 2020; Vogel et al., 2020; Białaszek, Marcowski, & Ostaszewski, 2017; Soutschek et al., 2014; although see also the works of Chong et al., 2017; Hess, Lothary, O'Brien, Growney, & DeLaRosa, 2021; McGuigan, Zhou, Brosnan, & Thyagarajan, 2019). A second line of evidence supporting quadratic costs is found in tasks that require participants to hold a stimulus in working memory (e.g., a Gabor patch of a given orientation) and then reproduce that stimulus after a delay period. Errors on this task tend to be approximately Gaussian (Sprague, Ester, & Serences, 2016; Ma, Husain, & Bays, 2014; van den Berg, Shin, Chou, George, & Ma, 2012; Bays & Husain, 2008; Wilken & Ma, 2004), consistent with the predictions of ideal observer models that incorporate quadratic loss function (Sims, 2015; Sims et al., 2012; Figure 4D).
Recent work has begun to make explicit links between LQR and the neural implementation of cognitive control. Most notably, Bassett and colleagues have used LQR to model the large-scale control of brain networks (e.g., Tang & Bassett, 2018). This approach uses LQR modeling of whole-brain network dynamics to understand the ability of subnetworks to reconfigure macroscale brain states (Braun et al., 2021; Gu et al., 2015, 2021; Betzel, Gu, Medaglia, Pasqualetti, & Bassett, 2016; see also Yan et al., 2017). For instance, in an fMRI experiment using the n-back task, Braun et al. (2021) used an LQR model to infer inferred that the brain requires more control to maintain a stable 2-back state than a 0-back state, as well as more control to transition from a 0-back state into a 2-back state than vice versa. Interestingly, individual differences in these model-derived estimates of stability and flexibility were associated with differences in dopamine genotype, dopaminergic receptor blockade, and schizophrenia diagnosis (Braun et al., 2021). An LQR modeling approach has been similarly used to model dynamics in directly recorded neural activity to understand how local connectivity influences control demands (Athalye et al., 2021; Stiso et al., 2019), with accompanying theories of how these configuration processes are learned through reinforcement learning (Athalye et al., 2019).
Conclusions and Future Directions
The second half of the 20th century saw a wave of progress on mathematical models for optimal control problems in applied mathematics. A second wave of computational motor control followed closely, combining rigorous measurement of motor actions with normative models from this new optimal control theory (Todorov & Jordan, 2002; Uno et al., 1989; Flash & Hogan, 1985; Nelson, 1983; Chow & Jacobson, 1971). Recently, a third wave of cognitive control research has extended optimal control principles to goal-directed cognition (Musslick & Cohen, 2021; Piray & Daw, 2021; Lieder et al., 2018; Tang & Bassett, 2018; Shenhav et al., 2013, 2017; Yu et al., 2009; Bogacz et al., 2006). This work tries to formalize the principles that tie these different frameworks together, highlighting how cognitive control can learn from decades of computational motor control research. These principles have the potential to inform the theoretical development and focused empirical investigation into the architecture of goal-directed cognition. As behavioral tasks, statistical techniques, and neuroimaging methods improve our measurements of how the brain configures information processing, theoretical constraints will be essential for asking the right questions.
One insight that arises from casting cognitive control as regularized optimization is that the sources of the control costs that can enable “failures” of control are not necessarily because of cognitive limitations (e.g., limited capacity to engage multiple control signals). Instead, these costs can arise because of the flexibility of cognition, enabling a complex brain to optimize over degenerate control actions. Under this framework, effort costs help solve the decision problem of how to configure control. One productive application of this perspective may be to help shed light on why people differ in how they configure these multivariate signals, for instance prioritizing some forms of control over others. A regularization perspective would emphasize understanding different people's priors (e.g., perceptions of their own abilities; Shenhav, Fahey, & Grahek, 2021; Bandura, 1977) and configural redundancy when accounting for people's mental effort costs.
There are several important avenues for building further on the promising theoretical and empirical foundations that have been recently established in the study of multivariate control optimization. For instance, it will be important to understand how effort's role in solving the inverse problem trades off against other proposed benefits like generalization (Musslick et al., 2020; Kool & Botvinick, 2018) and efficiency (Zénon et al., 2019). It will also be important to develop finer-grained connections between computational theories of regularized cognitive control and the algorithmic and implementational theories of how the brain performs control optimization and execution. For instance, to what extent can specific regularized control algorithms such as LQR explain the dynamics of cognitive control optimization and deployment? How does the cognitive control system integrate across multiple monitored signals of goal progress and achievement (Haar & Donchin, 2020), including different forms of errors and conflict (Ebitz & Platt, 2015; Shen et al., 2015)? While LQR modeling has been a powerful approach for understanding the role of neural connectivity in goal-driven brain dynamics, more work is needed to bridge these findings to cognitive models of control optimization and specification.
In addition to understanding the computational goals of cognitive control optimization, it will be equally important to understand how biological control algorithms deviate from optimality. A substantial body of research has characterized apparent deviations from optimality during judgment and decision-making in the form of heuristics and biases (Kahneman, 2003; Tversky & Kahneman, 1974). Such seemingly irrational behaviors have been accounted for within decision frameworks by formalizing the rational bounds on optimality (Bhui & Xiang, 2021; Gershman & Bhui, 2020; Lieder & Griffiths, 2019; Parpart, Jones, & Love, 2018; Lieder, Hsu, & Griffiths, 2014; Lieder, Griffiths, & Goodman, 2012; Simon, 1955). The LQR algorithm may similarly reflect bounded optimality, as LQR is suboptimal when its linear-quadratic assumptions are a poor match to a task. A cognitive control system that uses LQR could reflect a trade-off between better computational tractability and poorer worst-case performance. Future research should incorporate the heuristics, biases, and approximations that influence cognitive control into models of control planning.
Progress on these questions will in turn require more precise estimates of the underlying control processes. The study of motor control has benefited immensely from high-resolution measurements of motor effectors, for instance tracking hand position during reaching. Analogous measures of cognitive control are much more difficult to acquire, in part because they require inference from motor movements (e.g., response time) and/or patterns of activity within neural populations whose properties are still poorly understood and are typically measured with limited spatiotemporal resolution. Future experiments should combine computational modeling with spatiotemporally resolved neuroimaging to understand the implementation of different types of control. In addition to addressing core questions at the heart of multivariate control optimization, such methodological improvements will also help us better understand the heterogeneity of multivariate effort. For instance, an untested assumption implied by existing theoretical frameworks is that all forms of cognitive control will incur subjective costs in a similar fashion, for instance that higher levels of drift rate and higher levels of threshold will both be experienced as effortful (cf. Shenhav et al., 2013). Although there is consistent evidence that enhancements to drift rate incur a cost, it remains less clear whether adjustments to response threshold incur a cost over and above the reductions to reward rate they can cause (cf. Leng et al., 2021). Further research is needed to examine this question and to explore both the magnitude and functional form of these cost functions across a wider array of control signals, especially with respect to deviations from participants' default configurations.
Our cognitive control is extremely complex and flexible and primarily operates over latent processes like decision-making, all features that make studying cognitive control a challenge. Thankfully, we can gain better traction on this inference by drawing from the rich empirical and theoretical traditions in better-constrained fields like motor control (Broadbent, 1977). The normative principles of optimal control theory, which have proven so fruitful in motor control, can similarly help inform our theories and investigations into cognitive control. Although our cognition will certainly diverge from these normative theories, these approaches can provide a core foundation for understanding how we control our thoughts and actions.
Acknowledgments
Special thanks to Laura Bustamante, Romy Frömer, and the rest of the Shenhav Lab for helpful discussions on these topics.
Reprint requests should be sent to Harrison Ritz, Cognitive, Linguistic, and Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI, or via e-mail: [email protected].
Funding Information
This work was supported by the Training Program for Interactionist Cognitive Neuroscience.
Xiamin Leng, National Institutes of Health (https://dx.doi.org/10.13039/100000002), grant number: T32-MH115895. Amitai Shenhav, National Institutes of Health (https://dx.doi.org/10.13039/100000002), grant number: R01MH124849. Amitai Shenhav, National Science Foundation (https://dx.doi.org/10.13039/100000001), grant number: 2046111.
Diversity in Citation Practices
Retrospective analysis of the citations in every article published in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .407, W(oman)/M = .32, M/W = .115, and W/W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/W = .109, and W/W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance. The authors of this article report its proportions of citations by gender category to be M/M = .716, W/M = .142, M/W = .066, W/W = .077.
Notes
Note that the DDM shares properties with several other evidence accumulation models that enable similar behavioral predictions and, in some cases, finer-grained predictions for neural implementation (Bogacz, 2007). We focus on the DDM as a reference point through much of this article because its properties have been closely studied from the theoretical and empirical perspective and it lends itself well to mechanistic hypotheses, but our attributions to this model and its parameters should be seen as potentially generalizable to related models.
We will use the term “monitored signal” to refer to signals that act as inputs to decisions about control allocation. In contrast, we use “control signals” to refer to the control that is allocated as a result of this decision process (analogous to “motor commands”; Shenhav et al., 2013).
The analytic solutions to these algorithms rely on ordinary least squares solutions for optimizing quadratic loss functions and Gaussian identities describing how quadratic loss functions change under linear dynamics. For in-depth mathematical treatments, see Recht (2018), Shadmehr and Krakauer (2008), and Anderson and Moore (2007).
A concern about effort discounting is that it ought to be estimated based on cognitive demands rather than task demands. Notably, participants consistently show quadratic effort discounting in the n-back task, one domain where there is at least a well-characterized linear relationship between these levels of task load and PFC activity (Braver et al., 1997).
REFERENCES
Author notes
This article is part of a Special Focus entitled, Perspectives from the 2021 recipients of the Cognitive Neuroscience Society's Young Investigator Award, Dr. Anne Collins and Dr. Amitai Shenhav.