## Abstract

Response inhibition is a widely studied aspect of cognitive control that is particularly interesting because of its applications to clinical populations. Although individual differences are integral to cognitive control, so too is our ability to aggregate information across a group of individuals, so that we can powerfully generalize and characterize the group's behavior. Hence, an examination of response inhibition would ideally involve an accurate estimation of both group- and individual-level effects. Hierarchical Bayesian analyses account for individual differences by simultaneously estimating group and individual factors and compensate for sparse data by pooling information across participants. Hierarchical Bayesian models are thus an ideal tool for studying response inhibition, especially when analyzing neural data. We construct hierarchical Bayesian models of the fMRI neural time series, models assuming hierarchies across conditions, participants, and ROIs. Here, we demonstrate the advantages of our models over a conventional generalized linear model in accurately separating signal from noise. We then apply our models to go/no-go and stop signal data from 11 participants. We find strong evidence for individual differences in neural responses to going, not going, and stopping and in functional connectivity across the two tasks and demonstrate how hierarchical Bayesian models can effectively compensate for these individual differences while providing group-level summarizations. Finally, we validated the reliability of our findings using a larger go/no-go data set consisting of 179 participants. In conclusion, hierarchical Bayesian models not only account for individual differences but allow us to better understand the cognitive dynamics of response inhibition.

## INTRODUCTION

Cognitive control is composed of a wide range of processes involving executive functioning. One widely studied component of cognitive control is response inhibition, which requires the suppression of a response (often a motor response) after a specified cue. This suppression often involves either withholding a response (“not going”) or canceling an already initiated response (“stopping”). Two paradigms often used to study response inhibition are the go/no-go task (measuring not going) and the stop signal task (measuring stopping). Response inhibition is particularly interesting because of its various clinical applications pertaining to attention deficit hyperactivity disorder (Nigg, 2001; Schachar & Logan, 1990), schizophrenia (Hughes, Fulham, Johnston, & Michie, 2012), obsessive compulsive disorder (Penadés et al., 2007; Bannon, Gonsalvez, Croft, & Boyce, 2002), and substance use disorders (Nigg et al., 2006; Monterosso, Aron, Cordova, Xu, & London, 2005).

Understanding response inhibition has been important at two seemingly different levels. First, response inhibition is often used to systematically examine differences in executive function between populations, such as comparing the stop signal RTs between children and adults or comparing neural activation during stopping processes between clinical populations and healthy controls. Second, response inhibition is often used as a diagnostic tool for assessing individuals, such as when making diagnoses about individual patients. Hence, an ideal framework for investigating response inhibition would synthesize these two objectives simultaneously. However, given the wide variability across individuals performing cognitive control tasks (Miyake & Friedman, 2012), so far it has been difficult to accurately assess individual differences, as well as justify generalization across individuals within a group.

Hierarchical (Bayesian) models are ideal inferential tools for response inhibition because estimates of group- and individual-level effects are obtained simultaneously (Turner, Sederberg, Brown, & Steyvers, 2013; Ahn, Krawitz, Kim, Busmeyer, & Brown, 2011; Lee, 2008; Shiffrin, Lee, Kim, & Wagenmakers, 2008; Rouder & Lu, 2005). In a hierarchical model, lower level parameters detailing individuals are estimated conditionally and are informed by higher level parameters detailing properties of the group. The hierarchical structure allows the information at the individual level to propagate up to a higher level and “pool,” where the pooled information can be used to perform group-level inferences. Reciprocally, the pooled information also conveys information to the lower level estimates by exerting a group-informed constraint on the individual-level model parameters. This “top–down” statistical pooling is especially helpful when data at the individual level are sparse or missing entirely. Although hierarchical models can be fitted to data in a frequentist inferential framework, Bayesian statistics offer some compelling advantages, including computational conveniences (e.g., Gibbs sampling), making them the chosen framework within this article (Lee & Wagenmakers, 2013; Lee, 2008; Shiffrin et al., 2008).

As many scientists wish to understand the neural basis of response inhibition, a popular measure used to study cognitive control is through fMRI. Although fMRI data provide convincing spatial resolution, they are notoriously noisy. This makes a systematic extraction of the inhibition signal an elusive endeavor in terms of both experimental design and statistical inference. Moreover, imaging studies have logistical constraints that are not present in experiments outside the scanner (e.g., safety protocol, structural scans, and hemodynamic lag), so there are often strict practical limitations on the number of trials experimenters can obtain for each individual participant. Bayesian analyses (including hierarchical Bayesian analyses) have been applied to fMRI data in many ways, including spatial priors, adaptive priors, and modeling effective connectivity (see Zhang, Guindani, & Vannucci, 2015, for a review). Here, we extend the generalized linear model (GLM) hierarchically by adding participant- and region-specific effects. Unlike standard GLM analyses, the models we develop produce single-trial estimates and measures of coactivation, so that we can compare patterns of activation across individuals. Beyond the practical benefits, as we show below, hierarchical Bayesian models are better able to accurately recover individual differences compared with a nonhierarchical Bayesian GLM.

The models we present below not only improve estimates of neural activity but also provide more detailed information than a standard analysis to further aid in understanding response inhibition. First, the models can allow for some temporal dynamics by estimating the neural response corresponding to every stimulus. Although fMRI has lower temporal resolution than other measures, single-trial estimates can still account for interactions occurring over time that may affect the neural response. For example, a stop trial after a long series of go trials may produce a different neural response compared with a stop trial immediately after another stop trial, due to the sequential dependency in brain dynamics. Second, the model's estimation of a coactivation matrix allows for investigations into the functional connectivity of key ROIs at both group and individual levels. We should note that using the coactivation matrix as a measure of functional connectivity has a slightly different interpretation from other measures of functional connectivity, so this should be considered before making direct comparisons to other methods. We estimate functional connectivity as correlations between coactivation, whereas it is often calculated by correlating the raw time series data. However, our method of analyses allows us to directly compare “group” and “individual” functional connectivity measures by using two different models. We believe this particular advantage is quite compelling, as it gives us an opportunity to preserve stable individual-specific characteristics of functional connectivity that are abstracted away from idiosyncratic details of the experimental design (Swick, Ashley, & Turken, 2011). For example, Gratton et al. (2018) demonstrated that functional connectivity measures remain stable across individuals and are significantly less variable compared with task and session factors. Hence, modeling individual-specific patterns of coactivation should provide a way of assessing individual variation relative to the group.

In this article, we aim to demonstrate the utility of hierarchical Bayesian models by using them to characterize individual differences in response inhibition. First, we detail the experimental methods and specify the set of models under investigation. Second, in a simulation study, we show that hierarchical Bayesian models are preferred to conventional GLMs (C-GLMs), particularly when the true data possess individual variation. Third, we apply the hierarchical Bayesian models to fMRI data from go/no-go and stop signal experiments. Here, we compare individual and task differences across neural activation and functional connectivity. We also provide evidence that variability in the tasks is a result of individual differences as opposed to run-to-run variability, sample size, or method of analyses. We conclude with a discussion comparing our work to previous findings, while emphasizing the prevalence and importance of individual variation in cognitive neuroscience.

## METHODS

### Participants

The 11 participants analyzed in this study were part of a larger study involving multiple cognitive tasks and well-being inventories (Gaut et al., 2019; Molloy et al., 2018). These 11 participants, in contrast to the rest of the group from the first scan, were recruited to take part in a second experiment. Whereas the first session involved a go/no-go task, the second session involved only a stop signal task. All participants were recruited from The Ohio State University and the surrounding community, and each provided informed consent. The study was approved by the institutional review board of the university. Among the 11 participants (mean age = 24.6 years, ranging from 18 to 48 years) included in the analysis, there were five women and six men.

### Stimuli

All stimuli were programmed in MATLAB using Psychtoolbox extensions (psychtoolbox.org) on a Windows PC. The participants lay supine on the scanner bed and viewed the visual stimuli back-projected onto a screen through a mirror attached onto the head coil. In the go/no-go task, participants were instructed to press a button when they viewed an A, B, C, D, or E and to not press any button when they viewed an X, Y, or Z. The button response was collected using an MRI-compatible fiber-optic response pad (https://www.curdes.com/). The transistor–transistor logic output from the fiber-optic response pad was fed into the RTBox (Li, Liang, Kleiner, & Lu, 2010) to measure RT with high accuracy. The stop signal task contained both of these “go” and “no-go” trials, but also on some trials, a go signal was presented but then after a delay, a stop signal (square around the letter) appeared on the screen. The go/no-go task consisted of 75 “go” and 25 “no-go” trials, for a total of 100 trials. The stop signal task consisted of 64 “go” trials, 16 “no-go” trials, and 80 “stop” trials of three different delays (individually fit for each participant, based on RT distributions in pilot testing). There were 160 trials per run, and each participant completed three runs of the stop signal task, resulting in 480 trials total. In this study, our analysis focused on just the first run from both tasks. Figure 1 shows the trial examples for both go/no-go and stop signal tasks. The jitter in each trial was designed in such a way that the trial duration ranged from 3 to 7 sec, with an increment of 1 sec. The trial duration was optimized by optseq (https://surfer.nmr.mgh.harvard.edu/optseq/).

### MRI Data Acquisition

MRI recording was performed using a 12-channel head coil in a Siemens 3T Trio Magnetic Resonance Imaging System with TIM, housed in the Center for Cognitive and Behavioral Brain Imaging at The Ohio State University. BOLD functional activations were measured with a T2*-weighted EPI sequence (repetition time = 2000 msec, echo time = 28 msec, flip angle = 72 deg, field of view = 222 × 222 mm, in-plane resolution = 74 × 74 pixels or 3 × 3 mm, and 38 axial slices with 3-mm thickness to cover the entire cerebral cortex and most of the cerebellum). In addition, the anatomical structure of the brain was acquired with the three-dimensional MPRAGE sequence (1 × 1 × 1 mm^{3} resolution, inversion time = 950 msec, repetition time = 1950 msec, echo time = 4.44 msec, flip angle = 12 deg, matrix size = 256 × 224, 176 sagittal slices per slab; scan time = 7.5 min) for each participant.

### Image Preprocessing and Analysis

The fMRI preprocessing was carried out using FEAT (fMRI Expert Analysis Tool; Woolrich, Ripley, Brady, & Smith, 2001) in FSL (FMRIB Software Library, Version 5.0.8; Smith et al., 2004). The first six volumes were discarded to allow for T1 equilibrium. The remaining images were then realigned to correct head motion. Data were spatially smoothed using a 6-mm FWHM Gaussian kernel. The data were filtered in the temporal domain using a nonlinear high-pass filter with a 90-sec cutoff. A two-step registration procedure was used whereby EPI images were first registered to the MPRAGE structural image and then into the standard (MNI) space using affine transformations. Registration from the MPRAGE structural image to the standard space was further refined using FNIRT nonlinear registration.

After the neural data were preprocessed, the time series from 24 ROIs were extracted. The selection of ROIs was based on related literature (Dunovan, Lynch, Molesworth, & Verstynen, 2015). Table 1 shows information about ROIs and their corresponding indices used in later figures. The MNI coordinates in the table defined the center of each ROI, and the radius of a sphere ROI was estimated from the number of voxels provided in Dunovan et al. (2015).

Number
. | Name
. | MNI xyz
. | nVox (<40)
. |
---|---|---|---|

1 | Callosum | [3 −23 29] | 208 |

2 | Posterior cingulate cortex | [−2 −56 22] | 957 |

3 | preSMA | [4 21 47] | 1952 |

4 | Left angular gyrus | [−44 −72 30] | 328 |

5 | Left fusiform gyrus | [−43 −60 −17] | 84 |

6 | Left IFG-1 | [−37 18 −4] | 912 |

7 | Left IFG-2 | [−44 9 29] | 426 |

8 | Left inferior parietal lobe | [−34 −52 46] | 459 |

9 | Left inferior temporal gyrus | [−56 −10 −20] | 44 |

10 | Left insula | [−39 −3 7] | 41 |

11 | Left middle frontal gyrus | [−3 50 −9] | 477 |

12 | Left putamen | [−27 −13 7] | 48 |

13 | Left superior frontal gyrus | [−9 57 35] | 128 |

14 | Left thalamus | [−6 −16 −2] | 72 |

15 | Left ventral striatum | [−1 16 −9] | 100 |

16 | Right caudate | [13 10 6] | 55 |

17 | Right IFG | [43 20 12] | 2830 |

18 | Right inferior parietal lobe | [48 −44 43] | 1400 |

19 | Right middle frontal gyrus | [38 48 −10] | 83 |

20 | Right middle temporal gyrus | [49 −66 26] | 60 |

21 | Right precuneus | [12 −67 42] | 83 |

22 | Right putamen | [31 −11 4] | 44 |

23 | Right superior frontal gyrus | [21 49 31] | 45 |

24 | Right thalamus | [9 −16 3] | 154 |

Number
. | Name
. | MNI xyz
. | nVox (<40)
. |
---|---|---|---|

1 | Callosum | [3 −23 29] | 208 |

2 | Posterior cingulate cortex | [−2 −56 22] | 957 |

3 | preSMA | [4 21 47] | 1952 |

4 | Left angular gyrus | [−44 −72 30] | 328 |

5 | Left fusiform gyrus | [−43 −60 −17] | 84 |

6 | Left IFG-1 | [−37 18 −4] | 912 |

7 | Left IFG-2 | [−44 9 29] | 426 |

8 | Left inferior parietal lobe | [−34 −52 46] | 459 |

9 | Left inferior temporal gyrus | [−56 −10 −20] | 44 |

10 | Left insula | [−39 −3 7] | 41 |

11 | Left middle frontal gyrus | [−3 50 −9] | 477 |

12 | Left putamen | [−27 −13 7] | 48 |

13 | Left superior frontal gyrus | [−9 57 35] | 128 |

14 | Left thalamus | [−6 −16 −2] | 72 |

15 | Left ventral striatum | [−1 16 −9] | 100 |

16 | Right caudate | [13 10 6] | 55 |

17 | Right IFG | [43 20 12] | 2830 |

18 | Right inferior parietal lobe | [48 −44 43] | 1400 |

19 | Right middle frontal gyrus | [38 48 −10] | 83 |

20 | Right middle temporal gyrus | [49 −66 26] | 60 |

21 | Right precuneus | [12 −67 42] | 83 |

22 | Right putamen | [31 −11 4] | 44 |

23 | Right superior frontal gyrus | [21 49 31] | 45 |

24 | Right thalamus | [9 −16 3] | 154 |

The table shows the number index, name of each ROI, MNI coordinates, and the number of voxels. MNI = Montreal Neurological Institute.

## MODEL SPECIFICATION

In addition to a standard GLM, we developed two models for the time series of the BOLD response by building in two types of individual variation. Figure 2 shows a graphical diagram of the two hierarchical models (Model 1 [M1] and Model 2 [M2]) and the C-GLM (right). Each node represents a parameter in the model, where shaded nodes are observed data and empty nodes are latent or unobserved parameters. Arrows represent conditional dependencies between parameters, and plates represent replications or loops across dimensions (such as conditions or participants). Both M1 and M2 provide estimates of the neural response to each stimulus. These single-stimulus parameter estimates provide temporal information that is lacking in a C-GLM analysis, as indicated by the plate notation in Figure 2. Although temporal precision in fMRI is not as precise as in other modalities such as EEG, single-stimulus estimates can still provide valuable information of activity across a run. In addition, the models we developed contain both individual- and group-level responses to a condition (i.e., stop, go, or no-go) that correspond to the neural activation (β) estimates from a standard GLM. However, unlike GLMs, the group and individual responses are estimated simultaneously. Finally, measures of functional connectivity between ROI pairs are built into both M1 and M2 through the variance–covariance matrix Σ. The difference between M1 and M2 is the assumption about these coactivation matrices. Specifically, whereas M1 contains only a single variance–covariance matrix across all participants (i.e., no individual variation), M2 contains a variance–covariance matrix for each participant (i.e., full individual variation). By comparing the single variance–covariance matrix in M1 to the set of variance–covariance matrices in M2, we can gain a better appreciation of the degree to which individual variation plays a role in assessing response inhibition.

Although both models are complex relative to the C-GLM, we previously found that these particular hierarchical levels allow better constraint and generalizability. In Molloy et al. (2018), we built five increasingly complex models, all with single-stimulus estimates, of the neural time series of the stop signal task. The simplest model had no hierarchical component. The next models constructed hierarchies across conditions; across conditions and participants; and finally across conditions, participants, and ROIs (i.e., M1 and M2). The models were compared in terms of fit to data, parameter constraint, and generalizability to other experimental runs within the same participant. We found that constructing a hierarchy across (at least) conditions and participants provided the best balance between fit, constraint, and generalizability. We chose to include the models that also included ROIs in this analysis to simultaneously estimate coactivation and understand individual differences within functional connectivity. In the previous analyses, M1 and M2 performed similarly well in terms of fit, constraint, and generalizability. Our rationale for focusing on M1 and M2 in this article is because they are particularly well postured to assess the magnitude of individual differences, especially with respect to differences in functional coactivation.

### Conceptual Overview

Here, we conceptually describe the models under consideration while providing an explanation of assumptions motivating each component. Explicit specifications of the likelihood and priors of all three models can be found in the Appendix. To begin, let *N*_{i,k,j,r,t} denote the observed BOLD response for the *i*th stimulus, *k*th condition, *j*th participant, *r*th ROI, at each time point *t*. We will use this notation for each effect across all model parameters to ensure consistency. Because the C-GLM considers ROIs and participants independently, the neural data node in Figure 2 is simply *N*_{k,t}. Although *N*_{i,k,j,r,t} is the only filled/observed node in Figure 2, the design matrix (not included for visual clarity) containing information about the stimulus condition and onset time is also observed and used within the model. Excluding the design matrix and neural time series, every other component of each model is latent or unobserved and hence will need to be estimated.

The three latent parameters that are directly related to the neural data *N*_{i,k,j,r,t} are $\beta j,r0$, σ_{j,r}, and β_{i,j,k,r}. These components are analogous to the parameters that would be estimated in a C-GLM analysis, where $\beta j,r0$ is the baseline activation (see β^{0} in the C-GLM), σ_{j,r} is the noise term (σ in the C-GLM), and β_{i,j,k,r} is the neural activation (β_{k} in the C-GLM). The major difference between these parameters and their GLM counterparts is they are estimated within a hierarchical framework, and thus, information from higher levels in the model is used to inform estimates on lower levels. Additionally, the neural activation is predicted on a single-stimulus level, as discussed previously.

Across the three models, the neural activation, baseline activation, and noise terms are all devised with specific assumptions in place. First, in our models, we assume that each participant will have distinct levels of baseline activation $\beta j,r0$ for each ROI. Additionally, this baseline activation is informed by a hyperparameter $\mu r0$, which we assume may vary from ROI to ROI, in a way that is consistent across individuals. Second, we assume that the noise components σ_{j,r} may differ across ROIs and individuals.

Finally, the single-stimulus neural activity parameter β_{i,j,k,r} is the most theoretically relevant to our research questions and the most complex compared with the other parameters. The neural activity parameter β_{i,j,k,r} is informed by two parameters: $\sigma r0$, a noise term, which informs the standard deviation of β, and δ_{j,k,r}, which informs the mean. The parameter $\sigma r0$ allows the noise level for the single-stimulus β estimates to vary across ROIs. The parameter δ_{j,k,r} is a condition- and participant-level hyperparameter of neural activation for each ROI. We will use this hyperparameter as an indicator of an individual's neural response in a particular ROI to going, not going, or stopping.

The individual conditional neural response variable δ_{j,k,r} is also informed by two different hyperparameters: μ_{k,r}, which informs the mean, and Σ, which informs the standard deviation. The parameter μ_{k,r} is interpreted as the group neural response to a condition in a particular ROI, and Σ denotes the covariance matrix. Σ is a 24 × 24 matrix (where the dimensions correspond to the 24 ROIs) that estimates the pairwise correlations of coactivation between ROIs over time and is the only difference between M1 and M2. Specifically, M1 assumes that all individuals have the same coactivation matrix Σ, whereas M2 assumes a separate Σ for every individual. By constraining Σ to be equivalent or separate across individuals, our modeling analyses allow us to explore whether individual differences are also present in the interactions between different ROIs.

### Fitting Details

All models were fit using Just Another Gibbs Sampler (JAGS; Plummer, 2003). Each model was fit using a standard pipeline procedure involving an initialization stage, a burn-in stage, and a sampling stage. In the initialization stage, three chains were placed in the parameter space and then underwent an adaptation period where the tuning parameters of the algorithms used by JAGS were adjusted for the particular parameters of a given model. In the burn-in stage, chains migrated to the highest density areas of the parameter space. Technically, the movement of the chains in this stage provides information about the target posterior, but these stochastic transitions are not considered true samples from the posterior because the chains have not yet converged to the region of the parameter space that most accurately defines the posterior's shape. Hence, these samples are not used when forming our estimate of the posterior distribution. In the sampling stage, the stochastic movement of the chains within the parameter space constitutes a series of draws from the target posterior distribution when collapsed across iterations. The collection of draws through iterations of the algorithm is what is used to estimate our posterior distribution, and this posterior is what we used to interpret the results of our parameters of interest.

In the simulation study, the C-GLM and M2 models were fit. For the C-GLM, model initialization ran for 2000 adaptations, followed by a burn-in period of 4000 iterations. The posterior sampling then ran for 6000 iterations. There were 18,000 samples for each parameter. For M2, model initialization ran for 1000 adaptations, followed by a burn-in period of 2000 iterations. The posterior sampling then ran for 3000 iterations. Hence, 9000 samples were used to estimate the joint posterior distribution of the model parameters. In the real data analyses, both M1 and M2 were fit to the data. For both models, model initialization ran for 1000 adaptations, followed by a burn-in period of 2000 iterations. The posterior sampling then ran for 3000 iterations. Hence, 9000 samples were used to estimate the joint posterior distribution for each parameter. In the large sample analyses, the C-GLM and M2 were fit in the same way as for the simulation study and real data analyses. For the C-GLM, model initialization ran for 2000 adaptations, followed by a burn-in period of 4000 iterations, and sampled for 6000 iterations. For M2, model initialization ran for 1000 adaptations, followed by a burn-in period of 2000 iterations, and sampled for 3000 iterations. For all models, the chains were plotted and visually checked for convergence.

## ADVANTAGES OF MODELING INDIVIDUAL DIFFERENCES: A SIMULATION STUDY

The multiple levels of hierarchy cause Models 1 and 2 to be much more complex than a C-GLM analysis, allowing for extraction of additional information from the data. For example, while the C-GLM estimates just one β value per condition, our hierarchical models estimate this conditional information, as well as a β estimate in response to every stimulus, providing information about temporal dynamics. Additionally, our hierarchical models show relationships between pairs of ROIs (through estimation of Σ) providing insight into functional connectivity. However, this additional complexity may come at a cost. In addition to the computational burden, hierarchical models could potentially overfit the data, potentially confusing noise for signal. Hence, it is important to first test our hierarchical models on the basis of how well they recover the true state of the world. Unfortunately, when using real experimental data, there is no way of knowing whether or not latent parameter estimates are actually correct. By simulating data based on chosen parameter values, we can identify how well a model is able to recover estimates. The accuracy and precision of the recovered posteriors can be used to infer if a model is providing accurate estimates when fit to real data. In this section, we compare the more complex M2 to the simpler C-GLM in terms of recovering both signal and noise.

### Methods

Data sets were generated using both the C-GLM and M2. To ensure that the generated data were realistic, the parameters and experimental design of the generated data were based on the fits and design of the real data. The data sets consisted of time series data for 24 ROIs and 11 participants to keep the dimensionality the same as the dimensionality of our real data. To choose reasonable “true” values, we fit the C-GLM and M2 to the real data and used the means of the posteriors from the β_{0}, conditional βs (for C-GLM: β and for M2: δ), and σ terms as the true values. The design matrix (i.e., presentation onset and condition order) was identical to the design matrix used in the experimental data. To test recoverability, we fit both models to both sets of generated data. In other words, the time series data generated by the C-GLM were fit by both the C-GLM and M2 models, and likewise, the time series data generated by M2 were fit by both the C-GLM and M2 models.

### Recovering Signal

We begin by comparing the ability of each model to accurately recover the “signal” present in the simulated data. Here, we directly compared the C-GLM and M2's ability to recover the neural activation in response to a go, no-go, or stop stimulus. Figure 3 shows the recovery of the condition-level β estimates where the top row shows the recovery for data generated by the C-GLM, and the bottom row shows the recovery for data generated by M2. The *y*-axis shows the true values, and the *x*-axis shows the recovered values from the C-GLM (blue) and M2 (red). Each point corresponds to a particular participant and ROI combination. The black line shows where the true and recovered values are equivalent; points that lie above the line underestimate the true values, and points that lie below the line overestimate the true values.

For the data simulated by the C-GLM, both the C-GLM and M2 can accurately recover the signal. However, for the stop condition, the C-GLM tends to overestimate smaller β values. Importantly, the added complexity in M2 does not hurt estimation of the signal parameters. Even when the C-GLM is the true data generating model (e.g., no stimulus-to-stimulus variability, no constraint based on other participants or ROIs), the more complex hierarchical model is still able to make reasonable estimates. The differences between the C-GLM and M2 are more discernible when fit to the data generated by the more complex model M2 (Turner, Wang, & Merkle, 2017). Here, both models can accurately recover β, but at the extremes, the C-GLM tends to underestimate small (negative) βs and overestimate large βs. Overall, both models can recover the signal, but when assuming the presence of individual differences (i.e., when data are generated by M2), M2 can better predict the extreme values of neural activity.

### Recovering and Differentiating Noise

The ability to discriminate between noise and the underlying signal in neural data is essential. Here, we compare the abilities of the C-GLM and M2 to accurately recover the noise term σ. In addition to the common (i.e., present in both C-GLM and M2) observation noise term σ, M2 estimates the variability from stimulus to stimulus through the term σ^{β}. By comparing the estimated σ^{β} term across M2 when fit to data either with or without stimulus-to-stimulus variability, we can assess the degree to which (1) M2 accurately estimates σ^{β} and (2) the estimates of the C-GLM are distorted by this additional, unaccounted for variance term. Figure 4 shows the recovery of observation noise (σ; left column) and the estimation of stimulus-to-stimulus variability (σ^{β}; right column). The results for the data generated by the C-GLM are in the first row, whereas the results for data generated by M2 are in the second row. The top left shows that both the C-GLM (blue) and M2 (red) are able to accurately recover σ. The top right shows M2's estimates of σ^{β} when fit to data generated by the C-GLM. Recall that the C-GLM assumes that there is no variability between trials and does not have a σ^{β} term, so the recoverability of the C-GLM and M2 cannot be directly compared. Instead, Figure 4 shows the posterior estimate of σ^{β} from M2 as box plots for each ROI. The blue horizontal line is at zero, which can be thought of as the true σ^{β} for the C-GLM. The estimates are not near zero and, as we will see next, are much larger than the estimates recovered from the data generated by M2. The large variability of the estimated posterior distributions are likely a result of the data not providing enough constraint to the posteriors, so the high means and wide spreads of the posteriors resemble the prior distributions.

The second row of Figure 4 shows the model-fitting results when the data were generated by M2. The top left shows that, unlike the results from the C-GLM, there is a clear difference in ability to recover σ between the C-GLM and M2. The C-GLM underestimates σ for almost every participant and ROI. Conversely, M2 closely recovers σ, although there is a slight tendency to overestimate it, especially as the true value of σ increases. The C-GLM's consistent underestimation of σ indicates that it is unable to isolate the effects of signal from the effects of noise. The bottom right shows the recovered σ^{β}s from M2. M2 slightly overestimates the true σ^{β}, but the range of these estimates is starkly different from the σ^{β} estimates of the data generated by the C-GLM in the top right. In conclusion, the stimulation study shows that M2 not only provides additional information for differentiating variability but also accurately recovers both signal and noise. By contrast, when the C-GLM is misspecified (i.e., when a more complex model generates the data), the estimated parameters are severely biased in ways that might affect our conclusions, as the bias of estimated noise could impact statistical significance.

## UNDERSTANDING RESPONSE INHIBITION THROUGH MODELING INDIVIDUAL DIFFERENCES

In the simulation study, we demonstrated that M2 is preferable to the C-GLM in situations where additional variability such as stimulus-to-stimulus variability or individual differences are present in the data. In this section, we apply our hierarchical models (M1 and M2) to the experimental data from go/no-go and stop signal tasks. First, we analyze neural activation in response to different conditions. Here, we discuss key patterns in both the go/no-go and stop signal tasks but also examine how individual variability differs across ROIs and conditions. Second, we investigate functional connectivity in the go/no-go and stop signal tasks on both group and individual levels. We aim to uncover any consistent patterns of coactivation between individuals for the two tasks. Finally, we explore the role of individual differences in response inhibition. Specifically, we examine whether it is necessary to assume that individual factors affect coactivation in the brain (as we do in M2).

### Neural Activation When Inhibiting a Response

#### Go/No-go

As expected, go and no-go stimuli evoke different responses in the brain. Figure 5A shows the mean group results of the δ distributions by condition in the go/no-go task. Each numbered dot corresponds to an ROI (location is approximated for visualization). The color of the dot denotes neural activation, where cooler colors represent a smaller or more negative activation and warmer colors represent a larger activation. Importantly, the individual variability in neural activation differs across ROI and condition. Figure 5B shows the mean of each individual's δ estimate (measured by the neural activation; *y*-axis) for each ROI (*x*-axis). Conditions are represented by semitransparent rectangles, with green rectangles for the go condition and blue rectangles for the no-go condition. The length of the semitransparent rectangles denotes the 95% credible interval of the δ_{j,k,r} posteriors. The green and blue *X*s represent the group mean for go and no-go, respectively, and are thus equivalent to the neural activation patterns in Figure 5A.

In our discussion, we focus on the major areas thought to be involved in response inhibition: the inferior frontal gyrus (IFG), the preSMA, and the BG—particularly, the subthalamic nucleus (Aron, Robbins, & Poldrack, 2014). In our analyses, the IFG consists of three ROIs: the right IFG (ROI 17), and two ROIs comprising the left IFG (left IFG-1 and left IFG-2; ROI 6 and ROI 7). One ROI comprises the bilateral preSMA (ROI 3). Finally, three ROIs correspond to BG structures found by Dunovan et al. (2015) to be involved in response inhibition: the right caudate (ROI 16), the left thalamus (ROI 14), and the right thalamus (ROI 24). In the go/no-go task, only one area, the right caudate, showed higher mean neural activation in response to go stimuli than in response to no-go stimuli (ROI 16, μ_{Go} − μ_{No-go} = 0.16 [−0.26, 0.57]). All of the other areas of interest, including the IFG (left IFG-1, ROI 6: μ_{Go} − μ_{No-go} = −0.20 [−0.61, 0.20], left IFG-2, ROI 7: μ_{Go} − μ_{No-go} = −0.29 [−0.76, 0.18], and right IFG, ROI 17: μ_{Go} − μ_{No-go} = −0.23 [−0.58, 0.12]), the preSMA (ROI 3, μ_{Go} − μ_{No-go} = −0.14 [−0.55, 0.26]), and the thalamus (left, ROI 14: μ_{Go} − μ_{No-go} = −0.17 [−0.58, 0.21] and right, ROI 24: μ_{Go} − μ_{No-go} = −0.15 [−0.53, 0.24]), showed more activation in response to no-go stimuli than to go stimuli. The bracketed values indicate the 95% credible intervals, calculated by taking the 2.5% and 97.5% quantiles of the μ_{Go} − μ_{No-go} posterior. Note that all of these 95% credible intervals contain zero.

#### Stop Signal

Figure 6A shows aggregated group results for each ROI in the stop signal task. Each row corresponds to a condition, where go is in the top row, no-go is in the middle row, and stop is on the bottom row. In Figure 6B, conditions are represented by semitransparent rectangles, where green rectangles are for the go condition, blue rectangles are for the no-go condition, and red rectangles are for the stop condition. Again, the length of the semitransparent rectangles denotes the 95% credible interval of the δ_{j,k,r} posteriors. The layout of this plot is otherwise identical to the layout of Figure 5. The model fit to the stop signal task also shows clear conditional differences in average activation, as well as differences in variability from ROI to ROI. The most striking result in Figure 6 is the neural deactivation across the brain in response to a stop signal. Mean activation was higher in response to go stimuli than in response to no-go stimuli for every area of interest discussed in the go/no-go task: bilateral IFG (left IFG-1, ROI 6: μ_{Go} − μ_{No-go} = 0.29 [−0.13, 0.70]; left IFG-2, ROI 7: μ_{Go} − μ_{No-go} = 0.27 [−0.21, 0.74]; right IFG, ROI 17: μ_{Go} − μ_{No-go} = 0.29 [−0.11, 0.67]); preSMA (ROI 3: μ_{Go} − μ_{No-go} = 0.24 [−0.19, 0.66]), right caudate (ROI 16, μ_{Go} − μ_{No-go} = 0.36 [−0.11, 0.67]), and bilateral thalamus (left thalamus, ROI 14: μ_{Go} − μ_{No-go} = 0.021 [−0.40, 0.43]; right thalamus, ROI 24: μ_{Go} − μ_{No-go} = 0.19 [−0.24, 0.62]). This result is nearly opposite to the results of the go/no-go task, where only the right caudate displayed a pattern of mean higher activation in go than no-go. Again, the bracketed values indicate the 95% credible intervals of the μ_{Go} − μ_{No-go} posterior. Note that all of these intervals contain 0, suggesting that, again, on a group level, there is not a strong difference between the go and no-go conditions within these key ROIs. Additionally, for these seven areas of interest, there was higher mean activation in response to go stimuli than in response to stop signals (μ_{Go} − μ_{Stop}: left IFG-1, ROI 6 = 0.45 [−0.065, 0.92], left IFG-2, ROI 7 = 0.67 [0.19, 1.20]*, right IFG, ROI 17 = 0.18 [−0.24,0.60], preSMA, ROI 3 = 0.87 [0.38, 1.40]*, right caudate, ROI 16 = 0.66 [0.13, 1.20]*, left thalamus, ROI 14 = 0.30 [−0.18, 0.79], and right thalamus, ROI 24 = 0.45 [−0.05, 0.92]). Here, three areas (left IFG-2, preSMA, and right caudate) had 95% credible intervals that were positive and did not contain zero, suggesting strong deactivation within the stop condition when compared with go. Furthermore, for six of seven of these areas of interest, there was higher activation in mean response to no-go stimuli than in response to stop signals (μ_{No-go} − μ_{Stop}: left IFG-1, ROI 6 = 0.16 [−0.37, 0.66], left IFG-2, ROI 7 = 0.40 [−0.16, 0.97], preSMA, ROI 3 = 0.63 [0.11, 1.10]*, right caudate, ROI 16 = 0.30 [−0.26, 0.85], left thalamus, ROI 14 = 0.28 [−0.19, 0.77], and right thalamus, ROI 24 = 0.26 [−0.24, 0.76]). Here, only the credible interval for the preSMA did not contain zero, again suggesting strong deactivation in the stop condition, when compared with no-go as well. The mean for the right IFG was slightly negative, although the 95% credible interval still included 0 (μ_{No-go} − μ_{Stop}: right IFG, ROI 17 = −0.10 [−0.53, 0.34]). Although both not going and stopping measure a type of response inhibition, there was a clear difference in their neural responses.

### Functional Connectivity Within Response Inhibition Tasks

#### Go/No-go

Both models estimate Σ as a covariance matrix, but for the purposes of interpretability, each prediction (for each sample and chain) was converted into a correlation matrix and then averaged. Figure 7A shows a plot of the Σ matrix estimated from M1 fit to go/no-go data and four plots of Σ_{j} estimated from M2 fit to go/no-go data from four representative participants. In all five matrices, the diagonal components were removed to avoid distorting the scale, as the diagonal is always equal to 1.0. All five plots are colored according to the same scale, where warmer colors show a higher correlation and cooler colors show a smaller or more negative correlation.

Overall, there were no consistent patterns of coactivation. First, there were no consistent trends on an individual level (shown by the lack of similarities between the individual σ estimates from M2). Second, the group-level matrix of M1 did not resemble any of the individual matrices. Although this may be somewhat apparent from Figure 7A to provide more quantitative evidence, we calculated the differences between the pairwise correlations of M1 and the pairwise correlations of M2 for each individual. Figure 7B displays the box plots of these differences, where each box plot represents a different participant. Positive values (above the horizontal line) indicate that M1 estimated higher correlations than M2 for a given participant. The mean coactivation for M1 was higher than that of M2 for every participant. To further corroborate this difference between M1 and M2, we calculated the region of practical equivalence (ROPE; Kruschke & Liddell, 2018). We defined the ROPE as ranging from −0.05 to 0.05. If a large proportion of the differences are within this ROPE, we would conclude that M1 and M2's estimates of functional connectivity are essentially the same. However, we found that a minority of the samples (13.98%) were within the ROPE. These results taken together suggest that individual differences in functional connectivity are a major source of variation in the go/no-go task.

#### Stop Signal

Figure 8A shows the group correlation matrix from M1 and representative participants' correlation matrices from M2. The scale in this figure is for the ranges of the stop signal Σs and is not equivalent to the scale in Figure 7A, but otherwise, the layout of the figures is equivalent. Similar to the go/no-go task, the group matrix did not represent the individual matrices. To quantify these dissimilarities, we again constructed box plots of the differences in coactivation estimates between M1 and M2, shown in Figure 8B. As observed previously in the go/no-go task, the mean differences were all positive, meaning that M1 estimated higher coactivations than M2 across all 11 participants. Furthermore, the ROPE analysis (with the same defined region) concluded that only 7.80% of the differences were within the ROPE. This is a smaller proportion than that calculated for the go/no-go task, suggesting that the differences between M1 and M2 are even more pronounced in these data. Additionally, the individual matrices show no recurrent patterns. Thus, we can conclude that individual differences in functional connectivity also arise in the stop signal task. Moreover, these stop signal matrices, on both group and individual levels, did not resemble the go/no-go matrices, although there is some similarity for Participants 4 and 11.

The model estimates of Σ and δ demonstrate widespread differences between tasks and individuals. In both go/no-go and stop signal tasks, the degree of individual differences in these neural responses differed from ROI to ROI (Figures 5B and 6B). Comparisons of Σ further support the claim that individual differences are integral to (but not similar within) both go/no-go and stop signal tasks (Figures 7 and 8).

## DISTINGUISHING BETWEEN INDIVIDUAL DIFFERENCES AND ADDITIONAL FACTORS

We argue that the variability we observed in neural activation and functional connectivity is the result of individual differences. However, there are many other factors that could produce this variability. First, we used only 11 participants in our analyses, and this relatively small sample size could be causing a more extreme (nontypical) participant to influence the hyperparameters of the hierarchical model, skewing the results of every individual. Second, hierarchical Bayesian models are much more complicated than standard GLM analyses. Although our simulation study provided evidence that the hierarchical approach is preferable, these advantages could be present only in smaller samples. Third, the variation observed may actually be a result of run-to-run differences. It is plausible that what we perceive as variation caused by individual differences may actually just be a result of another phenomenon, such as practice effects. In this section, we aim to validate that the variability we observed is actually a result of individual differences, not these aforementioned possible confounds.

### Sample Size and Analysis Variability

In all of the above analyses, we chose to focus on 11 participants who completed both response inhibition tasks to directly compare the tasks and types of inhibition while accounting for the distinct individual differences. These 11 participants, as noted, were part of a larger pool of participants consisting of 168 other participants who also completed one run of the go/no-go task. The task details and model fitting procedures (for both M2 and the C-GLM) are identical to those described in the methods. This larger subset of the same task fit to the same models allows us to decouple any confounds that may have arisen from using a smaller sample size. In this section, we aim to demonstrate that the results are consistent in small and large sample sizes. First, the results we reported on above are not greatly influenced by using more participants to inform the hierarchical hyperparameters. Second, hierarchical Bayesian models are advantageous over the C-GLM even in larger data sets.

Figure 9 compares the neural activation estimates of going and not going (δ) from M2 and the C-GLM in small (*n* = 11) and large (*n* = 179) data sets. In each panel, the points represent a δ estimate for one ROI and one participant for either the go or no-go conditions, denoted by green or blue points, respectively. The diagonal line signifies equivalence between the methods and sample sizes being compared, and the printed value in the top left corner is the correlation. The left plot compares the hierarchical estimates between the 11 participants for the small (*x*-axis) and large (*y*-axis) samples. The two estimates are highly correlated, showing that the hyperparameters in the larger sample size do not greatly skew the individual estimates. This suggests that the results we observed (for the smaller sample size) are consistent regardless of sample size. The next two panels of Figure 9 compare the methods of analyses in the small samples (middle plot) and large samples (right plot). In both the small and large samples, the estimates of M2 and the C-GLM are highly correlated, though the smallest and largest values are more extreme in the C-GLM than in M2 in both sample sizes. The no-go values are particularly more extreme for the C-GLM, as there are fewer trials than the go trials, so the advantages from the pooling of data in M2 are most noticeable. Thus, the shrinkage imposed by the hierarchy is consistent regardless of sample size. Additionally, this pattern of shrinkage parallels the results we observed in the simulation study (see Figure 3). From these comparisons, we can be more confident that the variability we observed is a result of individual differences as opposed to confounding factors from sample size or method of analysis.

### Run-to-run Variability

Using additional data from the go/no-go task, we demonstrated that sample size and method of analysis are distinct from the individual differences we observed. Now, we use the additional data from the additional runs of the stop signal task of the original 11 participants to distinguish run-to-run differences from individual differences. First, we compare how functional connectivity in general varies across participants and across runs. If individual differences are more important than run-to-run differences, we would expect that the coactivation matrices estimated for each run of the model are higher correlated within a particular participant than when compared with another participant. In this case, coactivation would be more similar within a participant even across runs than between participants. Second, to further ensure that individual differences are key, we compare model fit of models including details from other runs to model fit of models not including details from other runs. In this case, if individual differences are more important, then the model fit will be improved when considering connectivity from that participant in a previous run. If run-to-run differences are the source of variability, then including information from a different run would not improve model fit statistics.

#### Functional Connectivity between and across Participants

This section comprises two analyses. M2 was used to test these analyses because it allows individual differences to be present in coactivation. The first analysis looks at differences between individuals' coactivation when all three runs are considered. Here, we took the pairwise correlations between each participant's coactivation matrix. The second analysis tests how an individual can vary with from one run to another. Here, we look at the correlations of the coactivation from one run to the next within a single participant.

First, Figure 10A shows correlations between Σ matrices across different participants. The figure shows an 11 × 11 matrix showing the correlations between the coactivation matrix of one participant to the coactivation matrices of each of the other participants. Each column/row represents an individual participant. The correlations are for all three runs and were calculated by concatenating the three coactivation matrices output from each model fitting for each run. Second, we wanted to explore whether or not the coactivation matrices were similar when compared on a run-to-run basis in an individual. Figure 10A shows 11 3 × 3 plots showing the correlations between the coactivation matrices between each run (*x* and *y* axes) for each participant (panels). The legend on the right is for both A and B.

In Figure 10A, the correlations between participants were overall close to zero or slightly positive (mean of 0.061). This follows our claim that modeling and reporting individual differences in this task is important. However, some pairs of participants had relatively larger correlations. Specifically, Participants 2 and 4 have the highest correlation (of 0.397) between coactivation matrices. To compare, the highest correlation across runs within a single participant (Figure 10B) is Participant 11, with an average of 0.267. Thus, the correlation between those two participants was higher than any participant's average correlation to their own coactivation matrices between runs. Furthermore, participants also vary greatly in Figure 10B. However, overall, participants have higher correlations from run to run in their own data than with other participants across runs. To summarize, although there were differences from run to run within the same participant, overall the differences between participants were greater. Therefore, the next step is to provide more evidence for the importance of including individual differences by evaluating model fit.

#### Constraining Connectivity Priors on an Individual Level

A way to test whether or not including individual differences in coactivation is important is to compare model fit between models that assume the coactivation matrix varies for individuals, not runs, with models that do not have this assumption. In other words, we will compare models that have informed (containing information from other runs) priors on Σ to models that have uninformed priors on Σ. To do this, we compared model fit between nine different model/data combinations. The deviance information criterion (DIC; Spiegelhalter, Best, Carlin, & van der Linde, 2002) was used as a measure of model fit. DIC measures model fit while penalizing for complexity, but because the models all have the same level of complexity, the differences in DIC show only model fit.

We first fit three “uninformed models.” Here, M2 was fit to each run separately with an uninformed prior on the covariance matrix (e.g., setting the scale matrix of the inverse Wishart distribution to an identity matrix). Then, using the estimated covariance matrix for those three models, we obtained a more informed estimate of the scale matrix. The expectation of an inverse Wishart distribution is the product of the degrees of freedom and the scale matrix. Thus, to approximate the scale matrices to be used in the “informed” model fit, the covariance matrices estimated by the “uninformed” model were averaged and then divided by a fixed degree of freedom (24). Approximate scale matrices were obtained for all three runs. The other six model fits are then the informed model fits (e.g., we fit Run 1 informed by Run 2, and fitting Run 1 informed by Run 3). The DIC values were obtained within JAGS. Table 2 shows the differences between the informed and uninformed runs, and the sample standard error. A positive difference value means that the informed model is better than the uninformed model. For Runs 1 and 2, the model fits are much better for when the model is informed by other runs. For Run 3, the difference is small, but the standard errors are large, so the data from Runs 1 and 2 neither significantly improved nor hurt the model fit. Overall, including individualized run data improved model fits to other runs, providing further evidence for the importance of individual differences.

Fit to
. | . | Informed by
. | ||
---|---|---|---|---|

Run 1
. | Run 2
. | Run 3
. | ||

Run 1 | Difference | — | 78.31265 | 26.4543 |

Sample standard error | — | 7.409526 | 6.270958 | |

Run 2 | Difference | 19.38235 | — | 19.11392 |

Sample standard error | 6.645165 | — | 6.431498 | |

Run 3 | Difference | −8.990306 | −5.408381 | — |

Sample standard error | 6.88419 | 7.771539 | — |

Fit to
. | . | Informed by
. | ||
---|---|---|---|---|

Run 1
. | Run 2
. | Run 3
. | ||

Run 1 | Difference | — | 78.31265 | 26.4543 |

Sample standard error | — | 7.409526 | 6.270958 | |

Run 2 | Difference | 19.38235 | — | 19.11392 |

Sample standard error | 6.645165 | — | 6.431498 | |

Run 3 | Difference | −8.990306 | −5.408381 | — |

Sample standard error | 6.88419 | 7.771539 | — |

The table shows the differences in DIC fit statistics and sample standard errors between M2 fit with an uniformed prior on Σ. Positive values indicate a better fit in models informed by an individual's coactivation matrices on another run.

## DISCUSSION

Through the application of hierarchical Bayesian models, we demonstrated the importance of individual differences in response inhibition tasks. First, through a simulation study, we showed that our hierarchical Bayesian model outperformed the C-GLM in terms of recovering and differentiating between signal and sources of noise, especially in contexts where individual differences are present. Second, we observed individual differences in condition-wise activation in both the go/no-go and stop signal tasks. Additionally, we found task differences characterized by a brain-wide deactivation following a stop signal (but not following a no-go cue). Third, the coactivation matrices demonstrated both task-wise and strong individual differences in the go/no-go and stop signal tasks. Fourth, we distinguished the individual variability we observed from run-to-run variability, sample size, and method of analysis. In this discussion, we further explore whether the variability observed is actually a result of individual differences, relate our findings to the response inhibition literature, and discuss limitations and further directions.

We argue that the variability observed across individuals in ROI activation and functional connectivity is a result of individual differences. However, variation could arise from a variety of sources. This is a reasonable hesitation especially when considering findings of significant day-to-day or session-to-session variability (Noble et al., 2017; Pannunzi et al., 2017). However, Gratton et al. (2018) found that daily or session-by-session factors accounted for only a small proportion of variability, given sufficient data across runs and participants. Additionally, they found that these functional networks (analogous, though as noted in the Introduction, not identical, to our coactivation matrices) were stable across individuals. Stable individual differences in functional connectivity have been found across a variety of domains, both at rest and (to a lesser degree) during different tasks (Gordon et al., 2017; Finn et al., 2015). To explore this possible issue within the response inhibition data presented above, we examined run-to-run differences in functional connectivity and model fit statistics. We found higher correlations within individual participants across runs than between different participants. Additionally, model fit improved when accounting for individual features of the correlation matrices in the other runs. Taken with previous findings, this suggests the primary source of the observed variability is individual differences.

Although individual differences are known to be influential in cognitive control (Miyake & Friedman, 2012), not all of our findings are in line with major theories of response inhibition. Although some aspects of our analyses corroborate major findings, other aspects contradict these findings, so we discuss some of these relationships and propose possible explanations for discrepancies. First, we noted differences between the go/no-go and stop signal tasks in terms of both conditional differences and ROI coactivations. This coincides with evidence from meta-analyses of versions of go/no-go and stop signal tasks that found different neural correlates and suggested that different systems may be involved between the two tasks (Swick et al., 2011; Rubia et al., 2001). Second, we did not observe increased activation in the right IFG in response to a stop signal. The right IFG is hypothesized to act as a brake within a network of outright stopping also involving the preSMA and subthalamic nucleus. This is a robust result that has been observed in fMRI, EEG, animal, and lesion studies (Aron et al., 2014). Although aspects of this hypothesis are still debated, especially regarding lateralization of the IFG (Swick, Ashley, & Turken, 2008) and involvement of entire networks (Hampshire & Sharp, 2015), there may be other factors that contributed to the observed deactivation in the right IFG in our analyses. One possibility is that our stop signal task does not follow the convention of having a minority of stop signal trials. Aron et al. (2014) argue that increases in proportion can turn the task into a decision-making task, as opposed to a response inhibition task. Another factor could be the size of the right IFG region in our analyses; at 2830 voxels, the right IFG was the largest ROI in our analyses. Based on evidence that a subregion of the right IFG (pars opercularis) may be responsible for stopping, our ROI may be too large to detect activation in this subregion (Levy & Wagner, 2011).

A possible limitation of our models is the stability of the coactivation matrices across conditions. In a meta-analysis, Swick et al. (2011) found different networks may be utilized during go/no-go tasks and stop signal tasks. Because our stop signal task combines both components of not going and stopping, based on these studies, it would be a reasonable assumption that connectivity measures would be different condition to condition, as there is evidence that there are different networks corresponding to each process. The purpose of our analyses was on individual differences across the task as a whole, but a next step would be to apply these models to more (or different) data to see how, for example, connectivity differs in inhibiting and initiating a response. In our analyses, having only 16 no-go trials per person was insufficient data to estimate a coactivation matrix. However, we may also expect not to see any differences across these matrices (if fit to sufficient data), as Gratton et al. (2018) found that functional networks varied mostly by individuals and only to a small degree by task. However, the analyses presented by Gratton et al. collapsed across the time series of the BOLD response within a task, so the degree to which different coactivation matrices are needed remains an open question.

A major limitation of our models is that they have no behavioral component. For example, the models treat unsuccessful and successful response inhibition the same. Although there is some evidence to suggest that some components of the neural response to a stop signal are the same regardless of successful inhibition (Aron & Poldrack, 2006), we believe this to be a strong assumption. Furthermore, these models provide no mechanistic explanation of the cognitive processes behind going, not going, and stopping. Numerous models have been proposed to represent the cognitive processes in the stop signal task (Logan, Van Zandt, Verbruggen, & Wagenmakers, 2014; Matzke, Dolan, Logan, Brown, & Wagenmakers, 2013; Logan & Cowan, 1984). However, only a few of these models incorporate neural data (Logan, Yamaguchi, Schall, & Palmeri, 2015; Boucher, Palmeri, Logan, & Schall, 2007). By linking the neural data to the behavioral data (Turner, Forstmann, & Steyvers, 2018; Turner, Forstmann, Love, Palmeri, & Van Maanen, 2017; Turner, Rodriguez, Norcia, McClure, & Steyvers, 2016; Turner, Van Maanen, & Forstmann, 2015; Turner, Forstmann, et al., 2013), the set of extant cognitive models can be further constrained, potentially providing an opportunity to better understand how response inhibition is carried out in the brain from a mechanistic perspective.

### Conclusions

Here, hierarchical Bayesian models revealed the ubiquity of individual differences in the neural processes underlying response inhibition. The models we constructed outperformed a standard analysis in separating signal from noise in fMRI data, especially when accounting for individual and trial-to-trial variability. The simultaneous group and individual estimates revealed the different dynamics in going, not going, and stopping on a group level while preserving individuality. Finally, analyses of coactivation between ROIs estimated by the models demonstrated the prevalence of individual differences within functional connectivity.

## APPENDIX: MODEL SPECIFICATION

*N*

_{i,k,j,r,t}for every time point

*t*(and thus every stimuli

*i*and condition

*k*) for every participant

*j*and ROI

*r*. In the neural likelihood, $\beta j,r0$ denotes an intercept term for baseline activation for each participant and ROI, and ϵ(

*t*) is an error term described below. These terms are added to the convolved hemodynamic response functions across the number of stimuli presentations,

*R*. Here, we assume the hemodynamic response function

*h*

_{0,i}is a double-gamma model (Glover, 1999; Boynton, Engel, Glover, & Heeger, 1996), with fixed shape parameters,

*a*

_{1}= 6,

*a*

_{2}= 16,

*b*

_{1}= 1,

*b*

_{2}= 1, and

*c*= 1/6.

_{j,r}is set to

**X**is the design matrix of conditions and onsets and

**β**is normally distributed:

_{j,k,r}and standard deviation $\sigma r\beta $. $\sigma r\beta $ varies across ROIs, with the vague prior:

_{p}(

*a*,

*b*) denotes a

*p*-dimensional multivariate normal distribution with mean vector

*a*and variance–covariance matrix

*b*.

**μ**

_{k,1:R}refers the

*k*th row of

**μ**where

**δ**. The hyper prior for

**μ**

_{k,1:R}is again a 24-dimensional multivariate normal distribution

**ϕ**

_{0}is a 24-dimensional vector of zeros and

*s*

_{0}is a (24 × 24) identity matrix. The variance of

**δ**is governed by Σ, a 24 × 24 variance–covariance matrix to capture patterns of pairwise coactivation between the 24 ROIs. In M1, we assumed these patterns to be similar across all participants. The prior for Σ follows an inverse Wishart distribution

*I*

_{0}is a (24 × 24) identity matrix and

*n*

_{0}= 24 is the degrees of freedom. An inverse Wishart prior was chosen because it is a conjugate prior for Σ and is thus computationally convenient.

**δ**. In M2, we assume that individual differences exist in the patterns of coactivation, and thus, Σ is estimated for every participant

*j*, so

*I*

_{0}and

*n*

_{0}are defined the same way as in M1. Additionally, M2 now uses Σ

_{j}instead of Σ in the prior for δ

**μ**is defined the same way as in M1.

*k*indices the four conditions.

Reprint requests should be sent to Brandon M. Turner, Department of Psychology, The Ohio State University, 1827 Neil Avenue, Columbus, OH 43210-1132, or via e-mail: turner.826@gmail.com.