## Abstract

The dorsal attention network is consistently involved in verbal and visual working memory (WM) tasks and has been associated with task-related, top–down control of attention. At the same time, WM capacity has been shown to depend on the amount of information that can be encoded in the focus of attention independently of top–down strategic control. We examined the role of the dorsal attention network in encoding load and top–down memory control during WM by manipulating encoding load and memory control requirements during a short-term probe recognition task for sequences of auditory (digits, letters) or visual (lines, unfamiliar faces) stimuli. Encoding load was manipulated by presenting sequences with small or large sets of memoranda while maintaining the amount of sensory stimuli constant. Top–down control was manipulated by instructing participants to passively maintain all stimuli or to selectively maintain stimuli from a predefined category. By using ROI and searchlight multivariate analysis strategies, we observed that the dorsal attention network encoded information for both load and control conditions in verbal and visuospatial modalities. Decoding of load conditions was in addition observed in modality-specific sensory cortices. These results highlight the complexity of the role of the dorsal attention network in WM by showing that this network supports both quantitative and qualitative aspects of attention during WM encoding, and this is in a partially modality-specific manner.

## INTRODUCTION

An important characteristic of neural networks associated with working memory (WM) tasks is their sensitivity to WM load. This WM load sensitivity is more particularly observed in the dorsal attention network, involving the intraparietal cortices and the superior frontal gyri. A common assumption is that the dorsal attention network exerts a role of top–down attentional control during WM tasks by considering that higher WM load is also associated with higher attentional control demands (e.g., Majerus et al., 2012, 2016; Cowan et al., 2011; Postle, 2006; Corbetta & Shulman, 2002). At the same time, the precise nature of these attentional processes in WM remains a controversial question. On the behavioral level, it has been shown that WM load and attentional control account for variability in behavior that is partly shared and partly unique to load or control (Cowan, Fristoe, Elliott, Brunner, & Saults, 2006). The aim of this study was to clarify the role of dorsal attention network involvement in WM by distinguishing nonstrategic, quantitative aspects associated with encoding load and strategic, qualitative aspects associated with top–down task-related attentional control.

## METHODS

### Participants

Valid data were obtained for 26 right-handed native French-speaking young adults (14 men; mean age = 23.12 years, age range 20–33) recruited from the university community, with no history of psychological or neurological disorders. The data of five additional participants were discarded due to incomplete data acquisition (four participants) or sudden head movement resulting in volume-to-volume displacement exceeding 9 mm and 15° (one participant). The study was approved by the ethics committee of the Faculty of Medicine of the University of Liège and was performed in accordance with the ethical standards described in the Declaration of Helsinki (1964). All participants gave their written informed consent before their inclusion in the study.

The stimulus material consisted of digits (1–9), names of consonant letters (B, C, D, F, G, H, J, K, L, N, R, S, T, V), unfamiliar faces (nine male faces taken from FERET database; Phillips, Wechsler, Huang, and Rauss, 1998), and line stimuli presented in different orientations (nine different orientations). The verbal stimuli had been recorded by a neutral female voice and transformed to digital .wav mono sound files (44,100 Hz sampling frequency), with a normalized duration of 300 msec and a mean amplitude approximating 70 dB. The visual stimuli had a size of 397 × 529 pixels and a resolution of 96 ppi (see Figure 1). The stimuli were presented in continuous sequences containing 12 stimuli each, with an ISI of 50 msec; both auditory–verbal and visual stimuli had a presentation duration of 300 msec.

Figure 1.

Description of the experimental design used for the verbal and visual modalities of the WM task. The first two examples represent verbal WM trials, and the two last examples represent visual WM trials. For each of the two examples, the first row represents a low encoding load trial, and the second row represents a high encoding load trial. In the high control conditions, the participants had to selectively encode a specific stimulus category (digits or letters for the verbal WM trials; faces or lines for the visual WM trials), whereas in the low control condition, the participants encoded the stimuli as they appeared.

For the low load condition, each stimulus was repeated once immediately after its first presentation to diminish the amount of different items to be maintained in the focus of attention while ensuring that the same number of visual stimuli was presented as in the high load condition in which every successive item was different; the temporal separation of 50 msec between two adjacent items ensured that repeated items were still perceived, at the sensory level, as two successive auditory stimulations (see Figure 1). For the auditory–verbal sequences, the presentation of letters and digits was alternated within the same sequence to allow, for the high top–down control condition, selective encoding of only one of the two stimulus categories (see below). The same alternation also characterized visual sequences, with a regular alternation between line and face stimuli; again, a 50-msec black screen separating two successive items ensured that repeated items were perceived as two successive sensory events (see Figure 1).

At the beginning of each sequence, participants were informed about the type of control (high control: stimulus selection; low control: no stimulus selection) that was required and the stimulus category they had to focus on in the high control condition by an instruction screen displayed during 1500 msec before the start of the sequence. In the high control condition, the different stimulus categories (letters, digits; faces, lines) were targeted for an equal number of trials. During the entire task, the background of the screen was black. An instruction “In the list?” appeared 1000 msec after the last item of each sequence, and the participants either heard a probe stimulus (auditory–verbal condition) or saw a probe stimulus (visual condition), and the participants had to decide whether the stimulus had been in the list or not. To ensure that the participants' responses were based on active WM representations and not familiarity-based recognition judgments, participants were instructed to respond “yes” only if they were certain about their response. Participants pressed the response button under their middle finger for “yes” (i.e., definitely in the list) and the button under their index finger for “no.” Nonmatching stimuli were stimuli not presented in the list but were from the same stimulus category as the target stimulus category. Also, given that for rapid, continuous sequence presentation paradigms, information within the focus of attention is very quickly lost and updated, the number of positive trials largely exceeded the number of negative trials with a maximum ratio of one negative trial for four positive trials. This procedure was motivated by the fact that negative probes would not be informative about the content of information held in WM; pilot data had shown that, although recognition accuracy is up to 90% when items from the two most recently presented serial positions are probed, performance drops quickly to chance level for earlier serial positions.

Participants were allowed 3000 msec for giving their response; if the participant did not respond with the given time, a no response was recorded. After the response or after the 3000-msec response waiting time in case of no response, there was a fixation intertrial interval of similar duration as the duration of the trials, ensuring proper separation of the brain signal associated with each trial (two successive random Gaussian distributions, a first one being centered on a mean duration of 3000 ± 1000 msec and a second one being centered on a mean duration of 5500 ± 1100 msec, amounting to a total mean duration of 8500 msec). Finally, baseline trials, controlling for basic sensorimotor and decision processes involved in the tasks were also presented. They had the same structure as the experimental trials, except that they consisted in the continuous repetition of a single stimulus, and during the response stage, a perceptually identical or an acoustically reversed/contrast-reversed stimulus was presented; participants simply had to judge the perceptual “normality” of the probe stimulus relative to the standard presentation of the stimuli across the task.

For each stimulus modality, there were 20 trials for each of the four cells resulting from the crossing of the different conditions (low load–low control, low load–high control, high load–low control, high load–high control), as well as 20 baseline trials. The auditory–verbal and visual trials were presented in two different sessions on the same day, and the order of the sessions was randomly assigned to participants; the two sessions were separated by the acquisition of a T1 structural brain scan (see below). This allowed us to make between-modality predictions of load and control conditions based on independent data sets. Furthermore, participants completed a practice session for both verbal and visual trials outside the scanner before the start of the experiment to ensure that participants had understood the difference between the high and low control instructions and complied to task requirements.

### fMRI Analyses

#### Image Preprocessing

Data were preprocessed and analyzed using SPM12 software (version 12.0; Wellcome Department of Imaging Neuroscience, www.fil.ion.ucl.ac.uk/spm) implemented in MATLAB (Mathworks Inc., Natick, MA) for univariate analyses. EPI time series were corrected for motion and distortion with “Realign and Unwarp” (Andersson, Hutton, Ashburner, Turner, & Friston, 2001) using the generated field map together with the FieldMap toolbox (Hutton et al., 2002) provided in SPM12. A mean realigned functional image was then calculated by averaging all the realigned and unwarped functional scans and the structural T1-image was coregistered to this mean functional image (rigid body transformation optimized to maximize the normalized mutual information between the two images). The mapping from subject to Montreal Neurological Institute space was estimated from the structural image with the “unified segmentation” approach (Ashburner & Friston, 2005). The warping parameters were then separately applied to the functional and structural images to produce normalized images of resolution 2 × 2 × 2 mm3 and 1 × 1 × 1 mm3, respectively. Finally the warped functional images were spatially smoothed with a Gaussian kernel of 4 mm FWHM to improve signal-to-noise ratio while preserving the underlying spatial distribution (Schrouff, Kussé, Wehenkel, Maquet, & Phillips, 2012); this smoothing also diminishes the impact residual head motion can have on MVPA performance, even after head motion correction (Gardumi et al., 2012).

#### Univariate Analyses

Univariate analyses first assessed brain activity levels associated with attentional control and load manipulations in the verbal and visual WM tasks. For each participant, brain responses were estimated at each voxel, using a general linear model with event-related and epoch-related regressors. For each WM task (session), the design matrix contained four regressors modeling the encoding phase with two regressors for the control conditions (high, low) and two regressors for the load conditions (high, low); an additional regressor modeled the response phase. The sensory and motor control trials were modeled implicitly. The regressors resulted from the convolution of the onset and duration parameters for each event of interest with a canonical hemodynamic response function. The design matrixes for each session also included the session-specific realignment parameters to account for any residual movement-related effect. A high-pass filter was implemented using a cutoff period of 128 sec to remove the low-frequency drifts from the time series. Serial autocorrelations were estimated with a restricted maximum likelihood algorithm with an autoregressive model of order 1 (+white noise). For each design matrix, linear contrasts were defined for the two attentional control conditions and the two attentional load conditions. For each task, the resulting contrast images, after additional smoothing by 6 mm FHWM, were entered in a second-level, random effect ANOVA analysis to assess control and load responsive brain areas at the group level. The additional smoothing was implemented to reduce noise due to intersubject differences in anatomical variability and to reach a more conventional filter level for group-based univariate analyses ($42+62=7.21mm$; Mikl et al., 2008).

#### Multivariate Analyses

Multivariate analyses were conducted using PRoNTo, a pattern recognition toolbox for neuroimaging (www.mlnl.cs.ucl.ac.uk/pronto; Schrouff et al., 2013). They were used to determine the voxel patterns discriminating between the different control and load trials at an individual subject level. We trained classifiers to distinguish whole-brain voxel activation patterns associated with high versus low control and with high versus low load in the preprocessed and 4-mm smoothed functional images for the verbal and visual WM encoding events separately, using a binary support vector machine (Burges, 1998). For within WM modality classifications of control and load conditions, a leave-one-block-out cross-validation procedure was used. For cross WM modality predictions of load and control conditions, a leave-one-run-out cross-validation procedure was used, resulting in training the classifier on one modality and testing the classifier on the other modality. At the individual level, classifier performance was assessed by running permutation tests on individual balanced classification accuracies (Npermutation = 1000, p < .05). At the group level, classifier performance was tested by comparing the group-level distribution of classification accuracies to a chance-level distribution using Bayesian one sample t tests; Bayesian statistics were used given their robustness in case of small-to-moderate sample sizes and nonnormal distributions (Moore, Reise, Depaoli, & Haviland, 2015) and because, with these analyses, the bias toward accepting or rejecting the null hypothesis does not change with sample size. Furthermore, Bayesian statistics assess evidence for a model under investigation in the light of the data, whereas group-level classical t tests make population-level inferences; population-level inferences using a classical t test have been shown to be problematic when comparing classification accuracies against chance-level (Allefeld, Gorgen, & Haynes, 2016). A Bayesian factor (BF10) greater than 3 was considered as providing moderate evidence in favor of above-level classification accuracy, a BF10 greater than 10 as providing strong evidence, a BF10 greater than 30 as providing very strong evidence, and BF10 greater than 100 as providing decisive evidence (Lee & Wagenmakers, 2013). A BF10 smaller than the reciprocal of each number (<1/3, <1/10, <1/30, and <1/100) serves as commensurate evidence favoring the null, a conclusion that could not be drawn using null hypothesis statistical testing. Note that a Bayesian analysis approach was also used to assess group-level behavioral performance by using Bayesian ANOVA. A standard mask removing voxels outside the brain was applied to all images, and all models included timing parameters for hemodynamic response function delay (5 sec) and hemodynamic response function overlap (5 sec), ensuring that stimuli from different categories falling within the same 5 sec were excluded (Schrouff et al., 2013). The whole-brain multivariate analyses were followed up by ROI analyses to determine the role of the frontoparietal cortices of the dorsal attention network in the discrimination of the different load and control conditions (see the Results section for more details).

Finally, in addition to the ROI analyses, a searchlight decoding approach was used to determine the local spatial distribution of the voxels that discriminate between the different conditions (Kriegeskorte, Goebel, & Bandettini, 2006). A searchlight sphere of 10 mm was applied on the whole-brain multivariate feature map, and the classification accuracy of each voxel cluster was determined, using ad hoc code built for the Pronto toolbox and available at https://github.com/CyclotronResearchCentre/PRoNTo_SearchLight. Significance of searchlight classifications were also assessed at both individual and group levels, given that analyses of group-level classification accuracies can indicate small above-chance level classifications as being significant while at an individual level, only few (if any) participants may show significant classification accuracies. It is therefore important to also consider the prevalence of the effect across participants and not only the mean classification rate of the group (Allefeld et al., 2016). To obtain significance values for individual-level classification accuracies, we used binomial tests indicating the classification accuracy threshold at which voxels are significant at p < .05 according to a binomial distribution (Noirhomme et al., 2014). For displaying searchlight results, a prevalence image was built on individual searchlight classification maps summarizing the number of individual participants for which a given voxel showed a classification accuracy higher than the binomial significance threshold. This led to a prevalence image indicating the proportion of participants showing significant classification accuracies for a given voxel.

## RESULTS

### Behavioral Analyses

A first group-level within-subject Bayesian ANOVA assessed the effects of Load (high, low) and Control (high, low) manipulations as well as Task modality (auditory–verbal, visual) on recognition accuracy (for positive trials). Specific effect analysis showed that there was decisive evidence for the inclusion of Load (BFInclusion = 1.57e+15) and Control (BFInclusion = 3.22e+15) effects, whereas evidence for the inclusion of Modality (BFInclusion = 0.85) or any of the interactions was very low (all BFInclusion < 1.50). As shown in Figure 2A and B, recognition accuracy was, as expected, higher for the low versus high load condition and for the high versus low control condition in the auditory–verbal and visual WM tasks, respectively. The fact that recognition accuracy was higher for the high versus low control condition confirms that participants were able to selectively focus on target stimuli only, reducing the overall amount of stimuli to be maintained. A further analysis assessed response bias by calculating d′ and C scores (Brophy, 1986) by collapsing conditions within each modality to obtain reliable estimates of the rejection rates for the rarely occurring negative trials (see Methods). A Bayesian paired t test on d′ scores in the auditory–verbal and visual modalities showed very low evidence for an effect of Modality, BF10 = 0.22, with d′ values showing reliable discrimination of positive and negative trials in both the auditory–verbal (mean = 1.80 ± 0.59) and visual (mean = 1.73 ± 0.50) modalities. A Bayesian paired t test on C scores also showed very low evidence for an effect of Modality, BF10 = 0.30, with C scores indicating an overall conservative response criterion in the auditory–verbal (mean = 0.14 ± 0.24) and visual (mean = 0.22 ± 0.38) modalities, confirming that participants complied to the task instruction to accept targets only if they were certain about their response.

Figure 2.

An analysis of RTs for positive trials showed again very strong evidence for the effects of Load (BFInclusion = 1508.26) and Control (BFInclusion = 3343.69), whereas all interactions were associated with very low evidence (all BFInclusion < 0.65); this analysis, however, also revealed very strong evidence for a Modality effect (BFInclusion = +∞). As shown in Figure 2C and D, RTs were faster for the low versus high load conditions and for the high versus low control conditions, but they were also faster for the visual versus auditory–verbal modality. The results confirm that participants were able to implement a selective encoding strategy in the high control condition, which led to faster access to items selectively held in the focus of attention. The general faster RTs in the visual modality are likely to reflect the temporal nature of auditory–verbal stimuli, which can only be identified after a sufficient portion of the acoustic signal, which unfolds over time, has been presented: The mean difference in poststimulus onset RTs (315 ± 193 msec) is indeed equivalent to the duration of the acoustic signal of the auditory probe stimulus (300 msec).

### Neuroimaging—Univariate Analyses

A first set of neuroimaging analyses assessed the effects of load and control on univariate neural activity changes in the verbal and visual WM tasks. As expected, when contrasting the different load and control conditions, univariate analyses yielded few significant differential activity foci in the frontoparietal, dorsal attention network of interest, as shown in Table 1. The only contrast that yielded significant between-condition activity differences was the high versus low control contrast in the verbal WM task: the left intraparietal sulcus, covering the horizontal segment from anterior to posterior portions (see Figure 3 and Table 1), and the left dorsolateral pFC showed increased activity at pFWE_corrected < .05 (with a cluster-forming threshold of puncorrected < .001 at the voxel level; Eklund, Nichols, & Knutsson, 2016) for the high control condition, in line with an involvement of the dorsal attention network in top–down attentional control. This was paralleled by increased activity for the low control condition in the bilateral fronto-orbital cortex and, at uncorrected levels, in the bilateral TPJ; these regions are part of the ventral attention network involved in stimulus-driven attention and which has been shown to be activated (or less deactivated) when task-related top–down control is low (Majerus et al., 2012; Todd et al., 2005; Corbetta & Shulman 2002). Finally, as shown by a conservative conjunction null analysis (Friston, Penny, & Glaser, 2005) over all conditions in the verbal and visual WM tasks (see Table 1), frontoparietal cortices defining the dorsal attention network were strongly activated above baseline, with activity in the intraparietal sulcus covering the entire horizontal segment, from anterior to posterior portions, in both hemispheres. Next, we used multivariate analyses to determine to what extent neural activity patterns allow to distinguish between the different load and control conditions.

Table 1.

Univariate Activity Foci, as a Function of Control and Load Contrasts, as well as a Conjunction Analysis over All Conditions

Anatomical RegionBrodmann AreaNo. VoxelsLeft/RightxyzSPM {Z}-value
High > low control (auditory)
Inferior frontal gyrus 9/44 637 −40 12 28 4.83
Intraparietal sulcus (posterior) 385 −26 −60 42 4.46*
Low > high control (auditory)
OFC 10 371 −4 60 12 4.35*
No voxel survived threshold
High vs. low control (visual)
No voxel survived threshold
No voxel survived threshold
Conjunction null analysis (auditory and visual; all conditions)
Anterior cingulate 6/32 1753 −6 14 46 >7.76
Superior frontal gyrus  −28 −2 52 >7.76
742 46 56 7.76
36 58 7.25
Middle frontal gyrus  −38 28 30 5.59
223 38 40 26 7.05
Inferior frontal gyrus/insula 47 423 −30 26 >7.76
47 529 34 24 >7.76
Inferior frontal gyrus 44  −38 18 26 5.37
44 202 42 12 22 >7.76
Intraparietal sulcus 7/40 4053 −40 −40 42 7.56
−34 −50 42 >7.76
−28 −58 42 >7.76
7/40 766 50 −34 52 >7.76
38 −42 42 5.92
32 −62 42 6.00
Superior temporal gyrus 20 19 −48 −46 12 5.32
20 222 56 −46 12 6.98
41 14 14 5.64
Cerebellum  135 −28 −62 −28 >7.76
75 34 −60 −28 6.35
All regions are significant at p < .05, with voxel-level and/or cluster-level family-wise error (FWE) corrections for whole-brain volume.

*

p < .05 for cluster-level FWE corrections only, with a cluster-forming threshold of puncorrected < .001 at the voxel level.

Figure 3.

### Neuroimaging—Whole-brain and ROI Multivariate Analyses

Figure 4.

The whole-brain analyses were followed up by ROI analyses, focusing on the intraparietal sulci and the superior frontal gyri that define the dorsal attention network, as well as the dorsolateral pFC, which has also been associated with WM load in previous studies; to increase the precision of these analyses, the intraparietal sulcus ROIs were further segmented in anterior, middle, and posterior portions (Gillebert, Mantini, Peeters, Dupont, & Vandenberghe, 2013; Gillebert et al., 2012). The ROIs were defined as spheres with a radius of 10 mm selected from previously published studies with the following coordinates: x = ±43, y = −40, z = 43, x = ±34, y = −49, z = 45, and x = ±26, y = −60, z = 41, for the left/right anterior, middle, and posterior intraparietal sulci, respectively; x = −20/+26, y = −1/−2, z = 50/47 for the left/right superior frontal gyrus; and x = −40, y = 13, z = 28 for the left dorsolateral pFC (Majerus et al., 2012, 2016; Gillebert et al., 2012, 2013; Asplund et al., 2010). The ROI analyses proceeded as for the whole-brain multivariate analyses but by limiting the feature selection to the target ROIs. Using Bayesian one-sample t tests on classification accuracies for the attention control conditions in the auditory–verbal modality, very strong evidence for group-level above-chance discrimination was observed in the anterior intraparietal sulcus (left: mean classification accuracy = .64 ± .07, BF10 = 6.55e+7; right: mean classification accuracy = .60 ± .07, BF10 = 78,079.40), in the middle intraparietal sulcus (left: mean classification accuracy = .66 ± .08, BF10 = 2.47e+7; right: mean classification accuracy = .62 ± .08, BF10 = 77,084.22), in the posterior intraparietal sulcus (left: mean classification accuracy = .58 ± .09, BF10 = 348.20; right: mean classification accuracy = .60 ± .09, BF10 = 3199.10), in the superior frontal gyrus (left: mean classification accuracy = .61 ± .07, BF10 = 315,932.99; right: mean classification accuracy = .59 ± .07, BF10 = 97,078.90), and in the dorsolateral pFC (mean classification accuracy = .65 ± .08, BF10 = 1.79e+7). Very similar results were observed when conducting the same ROI analysis on the discrimination of load conditions in the auditory–verbal modality, with very strong evidence for group-level above-chance discriminations in the anterior intraparietal sulcus (left: mean classification accuracy = .57 ± .07; BF10 = 4784.00; right: mean classification accuracy = .58 ± .06, BF10 = 10213.00), in the middle intraparietal sulcus (left: mean classification accuracy = .56 ± .06, BF10 = 692.70; right: mean classification accuracy = .57 ± .08, BF10 = 293.96), in the posterior intraparietal sulcus (left: mean classification accuracy = .57 ± .06, BF10 = 4549.50; right: mean classification accuracy = .57 ± .05, BF10 = 124,050.40), in the superior frontal gyrus (left: mean classification accuracy = .55 ± .04, BF10 = 31498.57; right: mean classification accuracy = .55 ± .06, BF10 = 137.36), and in the dorsolateral pFC (mean classification accuracy = .58 ± .06, BF10 = 73,285).

When applying the same ROI analyses to the visual modality, robust evidence for above-chance discrimination of both control and load conditions was also observed in regions of the dorsal attention network. For the control conditions, strong to very strong evidence for group-level above-chance discrimination was observed in the anterior intraparietal sulcus (left: mean classification accuracy = .55 + .05, BF10 = 597.90; right: mean classification accuracy = .54 ± .05, BF10 = 103.90), in the middle intraparietal sulcus (left: mean classification accuracy = .58 ± .06, BF10 = 87,388.90; right: mean classification accuracy = .55 ± .07, BF10 = 28.61), in the posterior intraparietal sulcus (left: mean classification accuracy = .57 ± .07, BF10 = 371.80; right: mean classification accuracy = .56 ± .06, BF10 = 956.30), in the superior frontal gyrus (left: mean classification accuracy = .56 ± .05, BF10 = 11,862.12; right: mean classification accuracy = .55 ± .08, BF10 = 24.79), and in the dorsolateral pFC (mean classification accuracy = .57 ± .07, BF10 = 1490.00). For the load conditions, very strong evidence for above-chance discriminations was observed in the left anterior intraparietal sulcus (left: mean classification accuracy = .56 ± .05; BF10 = 51198.00; right: mean classification accuracy = .56 ± .06, BF10 = 865.00), in the middle intraparietal sulcus (left: mean classification accuracy = .58 ± .08, BF10 = 1828.00; right: mean classification accuracy = .57 ± .07, BF10 = 981.83), in the posterior intraparietal sulcus (left: mean classification accuracy = .56 ± .06, BF10 = 2928.50; right: mean classification accuracy = .57 ± .06, BF10 = 2122.40), and in the dorsolateral pFC (mean classification accuracy = .56 ± .05, BF10 = 2277.00). Only evidence for above-chance discrimination in the superior frontal gyrus was inconclusive (left: mean classification accuracy = .52 ± .06, BF10 = 1.47; right: mean classification accuracy = .53 ± .07, BF10 = 0.831).

### Neuroimaging—Searchlight Analyses

To determine whether neural patterns in additional brain regions outside the dorsal attention network ROIs also contribute to differentiate the control and load conditions in the auditory–verbal and visual WM modality, a multivariate searchlight analysis was conducted. This analysis revealed that, in addition to the intraparietal sulci and the superior frontal gyri target areas part of the dorsal attention network, the attentional control conditions could also be reliably distinguished (as indicated by a binomial test; see Methods for further details) in the dorsolateral and ventrolateral pFC in up to 89% of participants for the auditory–verbal modality and up to 77% participants for the visual modality (see Figure 5A). When running the same analyses for the neural differentiation of attentional load conditions, modality-specific neural patterns in sensory processing areas were identified in the majority of participants (see Figure 5B). In the auditory–verbal modality, most participants showed discriminatory neural patterns (up to 77%) in the superior temporal gyrus close to the auditory cortex, reflecting the differences in auditory encoding load of the high and low conditions. In the visual modality, most participants (up to 77%) showed discriminatory patterns involving voxels in the bilateral fusiform gyri, reflecting the differences in visual encoding load of the high and low load conditions. At the same time, it is important to note that, for both modalities, neural patterns in the target areas of the dorsal attention network still discriminated between high and low load conditions in up to 65% (auditory modality) and 70% (visual modality) of participants, in line with the ROI analyses.

Figure 5.

### Neuroimaging—Cross-modality Predictions

Figure 6.

## DISCUSSION

This study examined the role of the dorsal attention network in verbal and visual WM tasks by examining to what extent this network supports attentional aspects involved in memory load versus memory control during WM encoding. We observed that multivariate neural patterns in the dorsal attention network were able to discriminate both between the different control conditions and between the different encoding load conditions, within verbal and visual WM modalities. Modality-specific neural patterns were further observed. In the verbal modality, neural patterns in frontotemporal cortices discriminated between high and low control conditions, and neural patterns in superior temporal cortices discriminated between high and low encoding load conditions. In the visual modality, neural patterns in bilateral fusiform gyri discriminated between high and low encoding load conditions. Between-modality predictions of neural patterns associated with top–down control or encoding load conditions were not reliable. Finally, univariate analyses did not show differential levels of activity in the dorsal attention network as a function of load conditions, which, unlike in most previous studies, were matched for differences in physical stimulus numerosity.

First, the results of this study show that, although univariate activity in the dorsal attention network may be attributable to confounds between encoding load and the physical numerosity of memory stimuli, multivariate activity patterns are still able to decode encoding load when physical numerosity is held constant. Intraparietal cortex is known to react to differences in numerosity, particularly for physical differences (e.g., different visual quantities) as opposed to abstract differences (e.g., digits symbolizing different numerical quantities; Bulthe et al., 2015; Piazza et al., 2004, 2007). In this study, the sensory events were matched in high and low load conditions as the number of temporally/visually separated physical stimuli was the same in both conditions. Our findings partially mirror the results of a study by Emrich et al. (2013), in which the numerosity of visual stimuli was also held constant (three dot patterns) while varying memory load (one, two, or three dot patterns with movements): Univariate load effects in the frontoparietal cortices did not appear during the encoding WM phase. However, these effects appeared later during the maintenance delay period; the nature of these univariate effects still needs to be clarified, as they may reflect load effects in nonstrategic maintenance of information in the focus of attention and/or top–down control processes involved in structuring and reviewing variable amounts of memoranda during the delay period. Moreover, it could be argued that the fast and continuous presentation of stimuli used in this study may have led to saturation effects in the hemodynamic signal, occulting possible univariate load effects during encoding; however, in that case, no univariate differences should have been observed for any contrast, which is contradicted by the observation of univariate differences for the control conditions at least in the auditory–verbal modality. In summary, our results indicate that, although differences in perceptual numerosity may be a confound for univariate load effects observed in previous studies at least during encoding, encoding load is still represented by multivariate activity patterns within the dorsal attention network after control of perceptual numerosity.

The second finding of this study is that two different aspects of attention are represented by activity patterns in the dorsal attention network during WM encoding. We showed that the dorsal attention network, known to be sensitive to WM load, does not only reflect top–down attentional control. Rather, our study shows that the dorsal attention network represents both the amount of items in the focus of attention (as reflected by encoding load) and task-related top–down attentional control (as reflected by the type of items that needs to be selected for retention), in line with WM accounts in which attentional control coexists with attention used for WM storage (e.g., D'Esposito & Postle, 2015; Cowan et al., 2006; Cowan, 1988). This is a critical finding, as previous studies have attributed the intervention of the dorsal attention network during WM tasks implicitly or explicitly to top–down, task-related attentional control mechanisms (Majerus et al., 2012; Riggall & Postle, 2012; Todd et al., 2005; Todd & Marois, 2004).

It is important to note here that the neural classifiers of encoding load and top–down control conditions were independent of each other while involving the same sets of images: Classifiers discriminating top–down control conditions were trained and tested on both high and low encoding load trials for each control condition and hence could not reflect encoding load-related neural differences as these differences were matched within the high and low control conditions. The same was the case for the classifiers discriminating encoding load, which were trained and tested on both high and low top–down control conditions for each load condition. Furthermore, in the verbal modality, univariate analyses revealed an increase of activation for the high control condition, despite that this condition led to lower WM load as a consequence of the top–down, selective encoding strategy requiring the subject to focus on only a subset of stimuli. If the attentional control factor merely reflected a load factor, then univariate analyses should have shown diminished activity for the high control condition, given that low load is typically associated with decreased posterior parietal cortex activity (Majerus et al., 2012, 2016; Cowan et al., 2011; Ravizza et al., 2004; Todd & Marois, 2004).

To conclude, this study shows that multivariate neural activity patterns in the dorsal attention network during WM do not only represent top–down attentional control aspects but also nonstrategic attentional aspects involved in memory encoding. Nonstrategic aspects of attention reflect the quantity of information in the focus of attention. Top–down controlled attention determines the quality of WM representations and directs the focus of attention to target information in a strategic, task-related manner. These attentional functions are supported by the dorsal attention network in both verbal and visual modalities, but their implementation further requires modality-specific neural processes. This study highlights the complexity of the attentional processes that characterize the recruitment of the dorsal attention network in WM tasks. Given that this study focused on WM encoding, further studies need to determine the extent to which the different aspects of attention identified in this study also characterize the intervention of the dorsal attention network during WM maintenance and retrieval stages.

## Acknowledgments

This work was supported by the government of the French-speaking community of Belgium (ARC, Convention 12/17-01-REST awarded to S. M.), by the Belgian Science Policy (Back-to-Belgium grant awarded to F. P.; Interuniversity Attraction Pole P7/11 awarded to S. M.), and by the National Institute of Health (NIH R01 HD-21338 awarded to N. C.).

Reprint requests should be sent to Steve Majerus, Psychology and Neuroscience of Cognition Research Unit, Université de Liège, Boulevard du Rectorat, B33, 4000 Liège, Belgium, or via e-mail: smajerus@ulg.ac.be.

