Recent theories assert that visual working memory (WM) relies on the same attentional resources and sensory substrates as visual attention to external stimuli. Behavioral studies have observed competitive tradeoffs between internal (i.e., WM) and external (i.e., visual) attentional demands, and neuroimaging studies have revealed representations of WM content as distributed patterns of activity within the same cortical regions engaged by perception of that content. Although a key function of WM is to protect memoranda from competing input, it remains unknown how neural representations of WM content are impacted by incoming sensory stimuli and concurrent attentional demands. Here, we investigated how neural evidence for WM information is affected when attention is occupied by visual search—at varying levels of difficulty—during the delay interval of a WM match-to-sample task. Behavioral and fMRI analyses suggested that WM maintenance was impacted by the difficulty of a concurrent visual task. Critically, multivariate classification analyses of category-specific ventral visual areas revealed a reduction in decodable WM-related information when attention was diverted to a visual search task, especially when the search was more difficult. This study suggests that the amount of available attention during WM maintenance influences the detection of sensory WM representations.
We are constantly called upon to maintain information temporarily in mind, but this working memory (WM) must also operate in the face of immediate and variable demands for our attention in the environment (e.g., rehearsing a shopping list while navigating heavy traffic). Although attention has typically been described as the selective processing of information that is currently available to the senses—and WM conversely acts on information unavailable to the senses—a large body of evidence indicates that demands on WM and attention reciprocally influence one another (Gazzaley & Nobre, 2012; Awh, Vogel, & Oh, 2006; Awh & Jonides, 2001) and engage many of the same brain regions (Jerde, Merriam, Riggall, Hedges, & Curtis, 2012; Ikkai & Curtis, 2011; Nee & Jonides, 2009; Mayer et al., 2007; Nobre et al., 2004; LaBar, Gitelman, Parrish, & Mesulam, 1999). This has encouraged the reconceptualization of WM as internally oriented attention that endogenously activates perceptual representations in much the same way as attention to external stimuli would (D'Esposito & Postle, 2015; Kiyonaga & Egner, 2013; Chun, 2011; Chun & Johnson, 2011). Accordingly, WM-related sustained increases in mean neural population activity, as indexed by univariate fMRI signal (in dorsolateral prefrontal cortex, for instance), were once assumed to represent the information being held in WM; however, many now consider those responses to reflect attentional control over sensory regions that represent the information content itself (Lara & Wallis, 2015; Postle, 2015; Sreenivasan, Curtis, & D'Esposito, 2014). In other words, attention is recruited to activate sensory representations for the purpose of WM.
Recent multivariate neural evidence also supports this “sensory recruitment” model of WM, whereby short-term representations are maintained via distributed patterns of activity within the same sensory cortical regions engaged by perceptual attention toward that content (e.g., area MT for memory of moving dot arrays; Riggall & Postle, 2012). The orientation of a Gabor grating maintained in WM, for instance, can be successfully decoded or reconstructed based on early visual cortex activity patterns derived from actually perceiving oriented stimuli (Albers, Kok, Toni, Dijkerman, & de Lange, 2013; Ester, Anderson, Serences, & Awh, 2013; Harrison & Tong, 2009; Serences, Ester, Vogel, & Awh, 2009). Conversely, perceived spatial locations can also be decoded from parietal cortex based on activity patterns derived from spatial WM maintenance (Jerde et al., 2012), providing further evidence for overlap in the representational codes for perception and WM. If WM content is indeed maintained in sensory cortices, via attention-dependent activation, a critical question is: What happens to such internally attended information in the face of incoming sensory input and concurrent attentional demands? Behavioral studies have shown that WM often suffers when attention is otherwise occupied during the WM delay (e.g., Fougnie & Marois, 2009), and the extent of that impairment scales with the time consumption of the intervening task (Barrouillet, Portrat, & Camos, 2011). Here, we employed fMRI to determine how this competition between internal and external attentional demands impacts the patterns of neural activation associated with sensory representations of WM content.
Many behavioral studies suggest that concurrent attentional demands can alter the “activation status” of a WM representation, relegating it to a distinct format outside an internal focus of attention (i.e., “silent coding”; Stokes, 2015), into which it can be reinstated when it becomes task relevant again (Gunseli, Olivers, & Meeter, 2016; van Moorselaar, Olivers, Theeuwes, Lamme, & Sligte, 2015; Kiyonaga & Egner, 2014; Kiyonaga, Egner, & Soto, 2012; Olivers, Peters, Houtkamp, & Roelfsema, 2011). Correspondingly, the active neural trace of a WM representation—as detected by multivariate pattern analyses (MVPAs)—is modulated by internal shifts of attention across a trial; immediately task-relevant representations elicit measurable neural signatures, whereas evidence for task-irrelevant memory representations is degraded (LaRocque, Riggall, Emrich, & Postle, 2016; Rose et al., 2016; Sprague, Ester, & Serences, 2016; LaRocque, Lewis-Peacock, Drysdale, Oberauer, & Postle, 2012; Lewis-Peacock, Drysdale, Oberauer, & Postle, 2012).
If activation in visual WM occurs by directing attention internally toward perceptual representations, then directing attention outwardly toward a visual task should similarly modulate WM representational information, and neural activation patterns in regions that represent the WM content should become less discriminable. Here, we used multivariate pattern classification of fMRI data to investigate whether WM category decoding is impacted when attention is occupied by visual search—at varying levels of difficulty—during the delay interval of a WM match-to-sample task. If WM and visual search both rely on attention, neural evidence for WM category representations should be degraded during visual search, and that degradation should be even more pronounced when a more difficult visual search condition diverts attention away from WM maintenance for a longer period.
Thirty healthy volunteers gave written informed consent to participate in accordance with the Duke University institutional review board. All participants were fluent in English, reported normal or corrected-to-normal vision, and were compensated $20 per hour for their participation. Two participants were excluded for missing data, leaving 28 participants in the final analyses (16 men; mean age = 30 years, range = 18–45 years).
The experimental protocol was designed to independently vary “internal” (i.e., WM) and “external” (i.e., visual) attentional load in a fully balanced 2 (WM load: one item vs. two items) × 2 (visual search difficulty: easy vs. hard) factorial design. The task was composed of a delayed match-to-sample WM test, with a sequence of delay-spanning visual searches (Figure 1A). We employed WM sample stimuli with known cortical sensitivities (i.e., faces and houses), so that we could examine the discriminability of visual cortical WM representations, via classifiers trained on the WM category, in the fusiform face area (FFA; Kanwisher, McDermott, & Chun, 1997) and parahippocampal place area (PPA; Epstein & Kanwisher, 1998).
Across different trials, participants had to maintain either one (low WM load) or two (high WM load) faces or houses for a later memory probe (Figure 1B). During the WM delay, participants performed a series of four visual searches for a perfectly vertical target stimulus among horizontal (easy search) or slightly tilted (hard search) distractors (Figure 1C). We borrowed this attentional manipulation approach from the time-based resource-sharing model of WM storage and processing, whereby a harder visual search task should occupy attention—that would otherwise be dedicated to WM maintenance—for a longer period (Barrouillet et al., 2011). Our main analyses focus on this search epoch of the trial, as we wanted to characterize how WM would be impacted by this secondary demand. To produce a balanced design, wherein WM category classification would be uncontaminated by overlapping visual input, visual search stimuli were either bodies (which have been shown to preferentially recruit the extrastriate body area [Downing, Jiang, Shuman, & Kanwisher, 2001]) or tools (which have been shown to recruit lateral occipito-temporal cortex; Chao, Haxby, & Martin, 1999). The design thus produced four main conditions: low WM/easy search, low WM/hard search, high WM/easy search, and high WM/hard search.
Dual-task WM/Visual Search Procedure
The task was programmed and presented in MATLAB (The Mathworks, Inc., Natick, MA) using the Psychophysics Toolbox extensions (Brainard, 1997). Face stimuli were 144 trial unique grayscale images of male and female faces, drawn from several databases (Tottenham et al., 2009; Oosterhof & Todorov, 2008; Minear & Park, 2004; Kanade, Cohn, & Tian, 2000; Endl et al., 1998; Lundqvist, Flykt, & Ohman, 1998), and cropped to include only the “eye and mouth” region. House stimuli were 144 trial unique grayscale exterior images drawn from local real estate Web sites. Visual search stimuli were 16 male and female bodies, with heads cropped (Downing et al., 2001), and 16 tools (hammers and wrenches) drawn from freely available online sources. Stimuli were displayed on a back-projection screen against a neutral gray background (RGB: 128 128 128) and viewed through a mirror mounted to the head coil simulating a viewing distance of approximately 80 cm. Behavioral responses were executed with the left and right hands on MRI-compatible response boxes.
Each trial began with a variable intertrial interval, followed by the WM sample for 2 sec. Low load WM samples consisted of a single, centrally-presented face or house. High load WM samples consisted of either two faces or two houses presented side-by-side. After a variable inter-stimulus interval, a series of four visual search displays appeared for 1.5 sec each, separated by 500-msec fixation intervals, producing a search sequence lasting 8 sec in total. Each search array was composed of four stimuli (either all tool or all body images) at the corners of an imaginary square. In all conditions, the target stimulus was perfectly vertical, whereas three distractors were tilted to the left or right. The task was to indicate whether the target stimulus was oriented right-side up or upside down. For easy search trials, the distractors were perfectly horizontal (i.e., tilted 90° to the left or right), making them easily discriminable from the vertical target. For hard search trials, on the other hand, distractors were slanted only 15° to the left or right, making their orientation less discriminable from the vertical target (Treisman & Gelade, 1980). Importantly, the type and number of stimuli were identical for easy and hard searches, equating the amount of perceptual input across all conditions—only the orientation difference between the target and distractor stimuli varied, serving as the manipulation of search difficulty. All searches within a given trial were of the same difficulty level. The search sequence was followed by a variable ISI and then a WM probe for 3 sec. Participants were asked to rate their confidence, on a 4-point scale, that a single WM probe item was either a match (50% of trials) or nonmatch to an item from the WM sample set. Underneath the probe image, a visual guide instructed which finger of the left hand should be used to indicate a response of either “definitely the same,” “maybe the same,” “maybe different,” or “definitely different.”
WM samples were selected in random order and never repeated across the experiment, except as matching probes. Visual search stimuli, locations, and orientations were also selected in random order on every trial but could repeat across trials. The duration of all intertrial intervals as well as presearch and postsearch ISIs were jittered between 2.5 and 5 sec (step size = 500 msec), selected at random from a pseudoexponential distribution (Dale, 1999), and counterbalanced to equate the length of all runs. Therefore, the onset of the visual search series was unpredictable, and the total length of individual trials could vary, but the duration of the search series was held constant at 8 sec for all trials. Participants completed a practice run of 16 trials outside the scanner and then nine experimental runs inside the scanner—each comprising 16 trials—for 144 trials. All trial conditions occurred equally often, and in random order, both within and across runs.
Functional Localizer Procedure
Participants also completed a functional localizer task to define cortical ROIs that preferred each of the WM and visual search stimulus categories (i.e., faces, houses, bodies, and tools). Each stimulus category was presented in separate blocks; each block entailed a series of 15 images, centrally presented for 750 msec and separated by a 250-msec fixation. Participants were asked to make a button response to direct repetitions of a specific stimulus (i.e., 1-back task). The run was composed of 16 blocks (four of each condition), which were separated by 10-sec interblock intervals and occurred in random order.
Functional data were recorded on a 3.0-T GE MR750 scanner, using a gradient-echo, T2*-weighted multiphase EPI sequence. Forty contiguous axial slices were acquired in interleaved order, parallel to the AC–PC plane (voxel size = 3 × 3 × 3 mm, repetition time = 2 sec, echo time = 28 msec, flip angle = 90°, field of view = 24 cm). Structural data were obtained with a 3-D T1-weighted fast inversion-recovery-prepared spoiled gradient recalled pulse sequence, recording 154 slices of 1-mm thickness and an in-plane resolution of 1 × 1 mm.
Analyses were done in MATLAB using SPM8 (Wellcome Department of Imaging Neuroscience, London, United Kingdom; www.fil.ion.ucl.ac.uk/spm/software/spm8). The first five volumes of each run were discarded to allow for a steady state of tissue magnetization. Functional data were then slice time corrected and spatially realigned to the first volume, coregistered with participants' structural scans, and normalized to the Montreal Neurological Institute template brain. Normalized functional images retained their native spatial resolution.
Mass Univariate Analyses
For analyses based on task-related changes in mean signal intensity, the normalized images were spatially smoothed with a Gaussian kernel of 9-mm3 FWHM, before applying a 128-sec temporal high-pass filter to remove low-frequency noise. A model of the main task was created for each participant via vectors corresponding to the onset of the visual search series (8-sec boxcar) for each experimental condition; the model accounted for WM and visual search load conditions, as well as stimulus category for both WM and search task components, resulting in 16 regressors of interest. All univariate analyses collapsed across stimulus category conditions, however, producing four main conditions of interest—low WM/easy search, low WM/hard search, high WM/easy search, and high WM/hard search. WM sample and probe periods, error trials (for both visual search and WM probe), head motion parameters, and grand means of each run were also modeled as separate nuisance regressors. Onset vectors were convolved with a canonical hemodynamic response function to produce a design matrix, against which the BOLD signal at each voxel was regressed.
Single-participant contrasts were then calculated to establish the hemodynamic correlates of WM load (all two-item WM > all one-item WM), visual search difficulty (all hard search > all easy search), and their interaction effects (Low WM/Easy Search + High WM/Hard Search > Low WM/Hard Search + High WM/Easy Search). Group effects were subsequently assessed by submitting the individual SPMs to one-sample t tests where participants were treated as random effects. To control for false positives, we applied a whole-brain voxel-wise false discovery rate (FDR) correction (p < .05, combined with a cluster extent of 20 voxels). To illustrate the nature of the observed activations, mean β estimates for each condition were extracted from 6-mm spherical ROIs, centered on peak group activations, using MarsBaR software (marsbar.sourceforge.net).
Regions of sensitivity for the WM categories were derived from the independent functional localizer task. A model of the localizer was created for each participant via vectors corresponding to the onset of the stimulus blocks (15-sec duration) for each of the four stimulus categories (face, house, body, and tool). Single-participant contrasts were then calculated to establish the hemodynamic correlates of house viewing (all house > all other categories) and face viewing (all face > all other categories). Group maps were furthermore constrained by anatomical masks of the fusiform and parahippocampal gyri (generated with the WFU_Pickatlas Toolbox; Maldjian, Laurienti, Kraft, & Burdette, 2003), for the face and house contrasts, respectively, and submitted to FDR correction (p < .05) to identify clusters of maximal responsivity to the stimulus categories.
Although standard mass-univariate analyses allow us to localize regions where mean signal intensity is sensitive to internal and external load demands, such variations do not convey precise informational content. Instead, the strength of multivariate decoding can arguably serve as a proxy for the quality of a neural representation (Emrich, Riggall, LaRocque, & Postle, 2013; Ester et al., 2013). We therefore created two models for multivariate analyses, using unsmoothed images, with the purpose of gauging how the discriminability of the neural WM representation is impacted when attention is diverted to processing external stimuli. The first “temporal” model included WM category (face vs. house) and visual search load (easy vs. hard) conditions via vectors of onsets (2-sec duration; i.e., a single repetition time) for each event in a trial—WM sample, WM delay, search trials, preprobe delay, and WM probe—convolved with a canonical hemodynamic response function. The second “searchlight” model was identical except visual searches within a trial were now reflected by a single onset (8-sec duration). Head motion parameters and grand means of each run were also modeled as separate nuisance regressors. ROI and searchlight classification analyses were implemented by training linear support vector machines, via the caret and kernlab packages in R (Kuhn, 2008; Zeileis, Hornik, Smola, & Karatzoglou, 2004) and the Decoding Toolbox (Hebart, Görgen, & Haynes, 2014), using a leave-one-run-out cross-validation procedure. Our design produced nine experimental runs, wherein each condition occurred equally often. Each classifier was thus trained on the patterns corresponding to maintenance of each WM category over eight runs and then tested on its ability to decipher the remembered category on the ninth run. The training set was then shuffled so that each run served once as the testing set, and classifier accuracy for a given searchlight or ROI would reflect the average classifier performance over those nine iterations.
Event-related ROI-based MVPA
Activity patterns coding WM information fluctuate over the course of an unfilled WM delay (Myers et al., 2015; Wolff, Ding, Myers, & Stokes, 2015; Sreenivasan, Vytlacil, & D'Esposito, 2014; Stokes et al., 2013; Lewis-Peacock et al., 2012; Meyers, Freedman, Kreiman, Miller, & Poggio, 2008). Furthermore, behavioral findings have shown that attention demands and time-related decay can interact in their impact on WM maintenance (Kiyonaga & Egner, 2014). Because the fate of neural WM representation patterns in a dual-task setting—when external stimuli must be attended—is unknown, we examined how the neural activity patterns conveying WM content would be impacted by the difficulty of a secondary task and, moreover, how this impact might accumulate or evolve over the course of a trial as attention continued to be otherwise occupied. Specifically, we wanted to assess the discriminability of WM information (i.e., whether the remembered category was a face or a house) across distributed regions that are engaged for the perception of the WM categories. We therefore conducted event-related MVPA within PPA and FFA ROIs that were independently and functionally defined with a separate localizer task. Classifiers were trained and tested on beta estimates from all voxels in each ROI; to account for differences in univariate activity that might influence decoding performance, these classifier inputs were mean centered and scaled for each condition, at each time point, within each ROI. To assess decoding of WM category across time, separate classifiers were trained for each task event, wherein the inputs were beta estimates for each 2-sec event across a trial, at both levels of visual attentional load (i.e., from the “temporal” model). Furthermore, to account for random resampling and model calculations within the caret package, each classifier was repeated 50 times; these accuracies were averaged to produce a single accuracy value for the classification. We therefore obtained, for each participant, two WM category mean classification accuracies (one each for easy and hard visual search conditions) at each of eight trial time points. To assess any potential differences in classification between easy and hard external attention conditions, these mean accuracy values were submitted to one-sample t tests against chance (50%) and paired t tests against one another.
Although ROI-based classification analyses test how memory information is represented by brain regions known to be sensitive to specific categories, we further examined the distribution of WM category information across the entire brain. Although many recent studies have decoded WM content information from primarily sensory regions that perceive that content (during an unfilled delay), a handful of studies have also found multivariate WM information in prototypically “attentional” frontal and parietal regions (Sprague et al., 2016; Ester, Sprague, & Serences, 2015; Christophel, Hebart, & Haynes, 2012). One study also suggests a unique role for parietal cortex in WM content representation in the face of predictable irrelevant distractors—which can presumably be anticipated and ignored (Bettencourt & Xu, 2016). To identify regions that might convey locally distributed patterns of WM category information, during completion of a delay-spanning visual task, we conducted whole-brain searchlight MVPA (Haynes et al., 2007; Kriegeskorte, Goebel, & Bandettini, 2006). The searchlight was a spherical cluster with a radius of three voxels, thus containing up to 123 voxels. Unlike in the ROI-based MVPA, inputs to the searchlight analysis were beta estimates across the entire 8-sec search sequence (i.e., from the “searchlight” model). A separate classifier was trained to discriminate the WM category based on multivariate patterns of input from all the voxels in a given searchlight, and that procedure was repeated for searchlights surrounding every gray matter voxel in the brain. The resultant participant-level accuracy maps were submitted to t tests at the group level against chance performance (50%) and thresholded with an FDR correction of p < .05.
Visual search accuracy (percent correct) was lower when the search distractors were less discriminable from the target (easy: M = 94.8%, SD = 9.1%; hard: M = 84.8%, SD = 12.9%), F(1, 27) = 27.3, p < .001, and search correct RT was also drastically slower in this high attentional load condition (easy: M = 848 msec, SD = 105 msec; hard: M = 1126 msec, SD = 112 msec), F(1, 27) = 257.1, p < .001. The external attentional load manipulation was therefore an effective means of modulating the time consumption of WM delay-spanning processing: When the search was harder, it took more time to complete, suggesting that attention would be diverted from WM maintenance during that time (Figure 2A). Although search accuracy was unaffected by internal (WM) load (Figure 2B, p = .7), search was slightly faster when two items were maintained in WM, F(1, 27) = 4.2, p = .052. Although we expected increased WM demands to impair concurrent attentional performance, this unexpected finding is consistent with the theory that increased attentional load should reduce processing of irrelevant distraction (Kim, Kim, & Chun, 2005; Lavie, 2005; de Fockert, Rees, Frith, & Lavie, 2001), which would improve search efficiency. Alternatively, higher demands may engage more cognitive control and therefore benefit ongoing processing (e.g., Waskom, Kumaran, Gordon, Rissman, & Wagner, 2014; Jha & Kiyonaga, 2010). However, neither search accuracy nor RT displayed an interaction between internal and external attention factors (all ps > .4).
WM probe performance was slower (low: M = 1464 msec, SD = 296 msec; high: M = 1636 msec, SD = 260 msec), F(1, 27) = 76.2, p < .001 (Figure 2C), and less accurate (low: M = 91.7%, SD = 9.1%; high: M = 80.4%, SD = 12.3%), F(1, 27) = 103.3, p < .001 (Figure 2D), when two items were remembered (vs. 1). The WM manipulation was thus effective at increasing internal attentional demands. Face WM (88%) was slightly better than house WM (85%), but not significantly so (p = .07), whereas the visual search stimulus category had no influence on WM performance (p = .4). WM probe responding was unexpectedly faster after harder visual search sequences, F(1, 27) = 8.2, p = .008. Like the unexpected improvement to visual search RT during higher WM load, this finding also suggests that the engagement of control during harder visual search may have benefitted WM speed (e.g., Waskom et al., 2014; Jha & Kiyonaga, 2010). However, probe accuracy was unaffected by the search difficulty, and neither probe accuracy nor RT displayed an interaction between internal and external load factors (all ps > .4).
Although participants had the option to submit WM responses on a 4-point scale, responses disproportionately favored the extremes of the scale (either “definitely same” or “definitely different”). Fewer than 25% of all responses used either “maybe” option, and four participants neglected to use those responses at all; thus, we report WM probe performance collapsed across confidence levels. Nonetheless, we conducted an additional control ANOVA of WM accuracy, with the added factor of Response confidence, and found no interactions between the Confidence factor and either WM load (p = .12) or Search difficulty (p = .5) and no three-way interaction (p = .35). Because the jittered delay lengths produced total WM delays that ranged from 12.5 to 17.5 sec, we also conducted a control ANOVA with a factor of Delay length—split into three bins for long, medium, and short delays—and found no main effect of Delay length on WM probe accuracy (p = .5) and no interactions between Delay length and any other factors (all ps > .2). Thus, neither WM recognition confidence nor total duration of the WM delay appears to have significantly impacted the results.
The absence of an interaction between the WM and visual search load factors was surprising because of the abundant prior evidence that WM performance can be impaired by concurrent attention demands (but see Hollingworth & Maxcey-Richard, 2013; Woodman & Luck, 2007; Vogel, Woodman, & Luck, 2006; Woodman, Vogel, & Luck, 2001). Thus, we investigated whether any potential reciprocity between internal and external attention demands may have manifested in another way in the present data set—for instance, via a tradeoff between the two task components. First, we conducted a mixed-effects logistic regression analysis, wherein the probability of a correct WM response was predicted by visual search RT on a trial-by-trial basis, accounting for individual differences within participants (i.e., modeling a random effect of participant). We limited the predictive factor of search RT to correct responses on the first search in each series, when WM encoding processes are expected to spill over and be maximally impacted by the external attentional task. Indeed, visual search RT significantly predicted a correct WM response, odds ratio = 0.50 (95% CI [0.30, 0.83]), p = .007, whereby faster search RT predicted better WM accuracy on a trial-by-trial basis. Within this model, WM accuracy was predicted by neither search load, p = .96, nor the interaction between search RT and search load, p = .61. These results are consistent with the idea that faster completion of the visual search task freed up (shared) attention for WM maintenance and therefore facilitated WM performance.
We also examined correlations between mean visual search and WM “load effects” (i.e., High load − Low load) and found that the two were negatively correlated with one another, r = −.52, p = .005. A larger effect of visual search load on search RT was associated with a smaller effect of WM load on probe accuracy. A larger visual search load effect was also strongly associated with better WM accuracy overall (r = .75, p < .001). To better characterize this association, we examined correlations between normalized visual search RT and WM accuracy at each search difficulty level (Figure 2E). When the search series was easy, faster search performance was associated with better memory, r = −.73, p < .001. When the search series was harder, however, the pattern was reversed: instead, slower search RT was associated with better memory accuracy, r = .65, p < .001. This pattern might emerge if the current visual search load were to impact the WM maintenance strategy that would most benefit WM performance. For instance, when the intervening visual task is hard and cannot be completed quickly, it may benefit WM maintenance to alternate attention between visual searching and refreshing the WM content (which would extend the search time). When the search is easier, however, it may be most effective to complete it as quickly as possible and then turn to WM maintenance processes. Thus, although harder visual search did not reduce WM accuracy overall, performance on the visual search task component was strongly related to WM performance both across the entire task (Figure 2E) and on a trial-by-trial basis. We next examined how these simultaneous WM and visual attention demands are reflected in neural measures.
Mass Univariate fMRI Results
We initially conducted mean signal intensity-based analyses to localize areas that respond to attentional task load and may reflect competition between internal and external task demands. First, we identified regions that displayed a main effect of the external (visual search) attentional load, during the search sequence (Figure 3A; all whole-brain FDR-corrected, p < .05). These encompassed a large bilateral network of frontal, parietal, and occipital cortical regions that are considered part of a “cognitive control network” and are typically engaged when task demands are high (Bertolero, Yeo, & D'Esposito, 2015; Power & Petersen, 2013; Niendam et al., 2012) as well as the cerebellum, thalamus, and basal ganglia. Thus, both visual search behavior and univariate neural measures were highly responsive to the external (visual attentional) load manipulation.
Although we observed no main effect of WM load (during completion of the search task), there was a robust interaction between visual search and WM demands (Figure 3B). Lateral prefrontal, parietal, posterior temporal (around the temporoparietal junction), and cerebellar clusters were sensitive to the combination of load in both the internal and external domains. Unthresholded t maps for both main and interaction effects are available online (neurovault.org/collections/AZELKTWQ/). As illustrated with beta values extracted from ROIs centered on local maxima of the interaction (Figure 3B), the magnitude of the response to search difficulty was dramatically magnified when WM load was high as well. Notably, rather than an activation increase with each increasing level of task demand, the heightened response to WM load during the harder visual search was reversed during the easier search. This interaction is consistent with the suggestion, from the behavioral results, that fundamentally different WM maintenance strategies may be employed during different attentional states. In summary, both the behavioral and univariate neural responses to WM load were impacted by concurrent visual search demands. Next, we assessed the fate of distributed neural patterns of WM category information in the face of competition for attention by external stimuli.
Multivariate fMRI Results
ROI-based MVPA Results
Our primary multivariate analysis addressed (1) how the discriminability of WM category information evolves across the trial in face- and house-sensitive ROIs and (2) how that evolution is impacted by the difficulty of an intervening visual task. WM category classification in the PPA (which was independently defined by univariate contrasts of a separate functional localizer task) displayed a U-shaped pattern across the trial (Figure 4A). Unsurprisingly, regardless of the search difficulty of the current trial, classification of the WM category was highly accurate (M = 85%) during presentation of the WM sample (i.e., when the stimulus was actually being perceived). Regardless of the difficulty condition of the visual search, WM decoding accuracy dropped after the offset of the WM sample. WM classification performance diverged, however, with the start of the visual search series, depending on the attentional demands of that search sequence. When the search was easier, WM category classification remained above chance (S1: t(27) = 2, p = .05). When the search was more difficult (and therefore diverting attention for a longer period), however, WM category classification dropped down to chance levels (S1: t(27) = −0.26, p = .8) and remained at chance throughout the rest of the search series (S2: t(27) = −1.3, p = .2; S3: t(27) = 0.8, p = .4; S4: t(27) = 0.7, p = .5). This difference between the easy and hard visual search conditions with respect to chance level decoding was also borne out in direct comparison between these conditions: WM category classification was significantly better for the easier search condition, especially early in the search series (F(1, 27) = 4, p = .05; S1: t(27) = 1.9, p = .07; S2: t(27) = 2.3, p = .03; Figure 4B). The distribution of classifier accuracies for individuals at the second search trial (when classification between the two conditions significantly differs) also illustrates that many more participants displayed highly accurate classification when the visual attention demands were low (Figure 4B). Thus, even in the face of persistent visual input—as well as a secondary task being performed on that input—category-diagnostic WM stimulus information was present in distributed patterns of neural activity when external visual attention demands were low, but not when external demands were high.
As more time passed across the visual search task, classifier performance converged to chance regardless of the difficulty of the search, suggesting that repeated perceptual input and attentional processing can eventually impede the detection of sensory WM patterns, even in a lower demand condition. When the probe appeared, however, WM category classification again increased above chance (M = 68%). A mixed-effects logistic regression analysis also revealed that longer visual search RTs significantly predicted worse classifier accuracy in PPA on a trial-by-trial basis, particularly for higher WM performers (search RT × WM performance interaction, p = .033). Accordingly, when we median split the group by behavioral WM probe performance to illustrate this result, the attention-sensitive decoding pattern was especially pronounced (Figure 4C). Only the higher WM performance group displayed more accurate WM category classification for easy versus hard search conditions (S2: t(13) = 2.4, p = .036), whereas the lower WM performance group displayed highly similar classifier performance for easy and hard search conditions across the entire trial (S2: t(13) = 0.67, p = .5). Here, we thus have evidence that the neural patterns of activity that convey information about the WM content are impeded—or possibly recoded in a different format (Stokes, 2015; Olivers et al., 2011)—when demands on visual attention are high and require longer processing times. Although these mean differences between conditions (and improvement over chance performance) are modest, the classifier accuracies are consistent with previous decoding studies of WM category and the modulation of that classifier evidence via a retro-cuing (Lewis-Peacock, Drysdale, & Postle, 2014) or external magnetic stimulation (Rose et al., 2016). Here, however, rather than endogenously shifting attentional priority within WM (e.g., in response to a retro-cue), attention was occupied by a demanding perceptual task.
An alternative explanation for this outcome is that, although the WM representations themselves remain unaltered by the search difficulty, their detection is affected by unrelated attentional activity within the same brain regions. Although the classification preprocessing steps (i.e., mean centering and scaling of classifier inputs) help to mitigate this concern, if this were true, we would expect the univariate response in the decoding ROI (i.e., PPA) to positively relate to the search load effect on decoding accuracy. Although univariate activity in the PPA was descriptively greater during harder search difficulty, the univariate search load effect was in fact uncorrelated with the search-related difference in decoding accuracy (from the same ROI) at all search time points (all rs < .1, all ps > .7). To further corroborate this null effect, we also conducted Bayesian correlations between the univariate and multivariate search load effects at each search time point, which revealed moderate Bayes factors ranging from 3.9 to 5.5 in favor of the null hypothesis. Thus, it is unlikely that the univariate response to search difficulty can explain the observed sensitivity of WM category decoding to the difficulty of the intervening visual task.
In the FFA, WM category classification followed a similar U-shaped pattern to that in the PPA, but decoding remained at chance levels, for both easy and hard search conditions, at all visual search time points, for all conditions (all ps > .2). That is, although stimulus category pattern classification was accurate during perception of the stimuli (i.e., during sample and probe periods), we were unable to classify the WM category from the FFA during the intervening visual task, in either attentional load condition. The functionally defined FFA was substantially smaller than the PPA. Moreover, several prior studies that have used face and house WM stimuli have focused on the PPA (over the FFA) and have found better pattern classification or more sensitive and behaviorally meaningful activations associated with “place” processing in general (Derrfuss, Ekman, Hanke, Tittgemeyer, & Fiebach, 2017; Kim, Lewis-Peacock, Norman, & Turk-Browne, 2014; Lewis-Peacock & Norman, 2014; Gazzaley, Cooney, Rissman, & D'Esposito, 2005; Yi, Woodman, Widders, Marois, & Chun, 2004). Although WM recognition performance was comparable for face (87%) and house (85%) memory, our findings and others suggest that WM-related activity in the FFA may be less diagnostic than that in the PPA, at least under the kind of dual-task conditions imposed during the visual search period in our experiment.
Searchlight MVPA Results
We also applied a searchlight procedure to identify regions that convey locally distributed patterns of WM category information, across the duration of the search sequence. Indeed, large clusters of searchlights covering the ventral occipito-temporal cortex classified the WM category significantly above chance (Figure 4D; whole-brain FDR-corrected, p < .05), even in the face of persistent visual input and additional attention demands. Individual searchlights were scattered across the rest of the brain (including frontal and parietal regions), but searchlights that classified the WM category above chance were overwhelmingly located in the ventral visual regions that typically respond to perception of stimuli from those categories (see whole-brain classifier accuracy map at neurovault.org/collections/AZELKTWQ/). We also ran two additional searchlight analyses, split by the difficulty of the intervening visual search. These analyses halved the number of beta inputs into each classifier, and neither analysis revealed searchlights that passed whole-brain FDR correction (but the spatial distribution of classifier performance for easy and hard visual search conditions can be examined at neurovault.org/collections/AZELKTWQ/).
Here, we tested the hypothesis that demands on visual attention should impact neural representations of visual WM content, based on the idea that WM maintenance occurs via attention-dependent recruitment of sensory cortices. We manipulated levels of both WM and visual search load in a dual-task paradigm and found converging behavioral and neuroimaging evidence that these “internal” and “external” attentional demands impact one another. For one, performance on the visual search portion of the task related to recognition accuracy at the WM probe—both on average across participants and on a trial-by-trial basis within participants—suggesting that visual search and WM maintenance are mutually reliant on attention. Second, an interaction in the univariate fMRI response in frontoparietal regions indicated that the neural response to load in one domain (i.e., WM) was strongly influenced by the load in the other domain (i.e., visual search). Finally, the discriminability of multivariate patterns of WM category activity in extrastriate visual cortex (specifically PPA) was reduced under higher visual attentional demands and related to the speed of performance of the search task, suggesting that the quality of the sensory cortical WM representation may be influenced by the amount of available attention during the WM maintenance period.
Although the univariate interaction effect emerged in frontal and parietal regions that have often been implicated in WM and attentional processes (Constantinidis & Klingberg, 2016; Eriksson, Vogel, Lansner, Bergström, & Nyberg, 2015; Curtis & D'Esposito, 2003), the pattern of activation observed here was novel and is consistent with the possibility that different combinations of attentional load demands may provoke distinct task strategies (cf. Derrfuss et al., 2017). That is, rather than a quantitative increase in “neural effort” (i.e., linear activation increases) with each increasing level of task demand, the heightened response to WM load during the harder visual search was reversed during easier search (Figure 3B). If each increasing level of load engaged frontoparietal regions more strongly, we would have expected a greater response to high WM load, even when the search was easy. Instead, high WM load related to less activity (vs. low load) when the search was easy. Combined with the correlation between search RT and WM accuracy—whose direction also flips between easy and hard search conditions (Figure 2E)—these data suggest that a harder visual attention task might invoke a qualitatively different WM maintenance or cognitive control strategy than the one used when the secondary task is easier. This is further suggested by the (counterintuitively) faster WM probe recognition after harder search series and faster visual search performance when two items were maintained in WM (as opposed to one). These findings may reflect reduced processing of distracting stimuli (and better performance) when attentional demands were high (Kim et al., 2005; Lavie, Hirst, de Fockert, & Viding, 2004; de Fockert et al., 2001) or may suggest that higher demands provoked greater engagement of cognitive control and therefore benefitted ongoing performance (Waskom et al., 2014; Jha & Kiyonaga, 2010). Thus, load demands in one domain clearly impact performance in the other domain, and strategies for managing dual-task demands may differ under different load conditions.
Critically, event-related pattern classification within the PPA demonstrated the sensitivity of neural WM category information to visual attentional demand levels (Figure 4). Regardless of the difficulty condition of the visual search, WM decoding accuracy sharply declined during the WM delay. Most importantly, however, this reduction in decoding accuracy during the search sequence was more profound when the search was more time consuming, although the amount of perceptual input was matched in the easy and hard search conditions. The more that attention was required to complete the delay-spanning visual search, the more the detection of neural WM category representations in the ventral visual cortex suffered. WM category classification was also predicted by behavioral performance (i.e., search RTs), suggesting that a longer time spent on the visual search task detracted attention from WM maintenance for longer and WM pattern classification therefore suffered more. Finally, search RT also related to WM accuracy, suggesting that the amount of time spent on the delay spanning task determines both the discriminability of category-diagnostic patterns in PPA as well as WM accuracy.
Although visual attentional demands impacted the classification of the WM category representation, the reduction in WM category discriminability under attentional load did not lead to an overall deterioration in behavioral WM recognition performance. This is consistent with a recent finding that WM orientation decoding in visual cortex is disrupted by irrelevant perceptual distraction, without an impairment to task performance (Bettencourt & Xu, 2016). Taken together with the univariate interaction results, the data suggest that a visual WM representation strategy may be more feasible when attentional demands are lenient but that WM content must be maintained by some other strategy when visual attention is concurrently taxed (Derrfuss et al., 2017; Olivers et al., 2011). These results are consistent with earlier indications that WM representations can be transferred into a different activation status to prioritize the immediately relevant task and then restored into the focus of attention when they are needed to guide behavior (LaRocque et al., 2012, 2016; Sprague et al., 2016; Kiyonaga et al., 2012; Lewis-Peacock et al., 2012), suggesting that different attentional states may promote distinct means of WM retention. Indeed, interest has grown recently in characterizing a hidden or “silent” WM coding scheme (Stokes, 2015). For instance, neural evidence for previously irrelevant (i.e., silent) WM items can be restored by external stimulation (Wolff, Jochim, Akyürek, & Stokes, 2017; Rose et al., 2016; Wolff et al., 2015), suggesting that this representational state may be implemented via patterned short-term changes in network synaptic weights (LaRocque et al., 2014; Stokes et al., 2013; Erickson, Maramara, & Lisman, 2009). It remains unclear, however, why and when an activity silent maintenance strategy is used (as opposed to persistent activity). Our findings raise the intriguing possibility that such a representational format—that is undetectable with the fMRI methods used here—might be relied on specifically when WM information must be maintained in the absence of sustained attention toward the WM content.
A broad searchlight classifier also decoded the WM category during a delay spanning series of visual searches, in the local patterns conveyed by clusters of voxels around the ventral visual regions that typically respond to perception of the WM categories (i.e., fusiform and parahippocampal gyri). This result supports the notion that WM maintenance is achieved through activation of sensory representations and marks an informative advance in the limits of WM decoding. That is, prior studies have successfully decoded WM content from visual cortices over an unfilled delay interval (i.e., no other perceptual input; Emrich et al., 2013; Ester et al., 2013; Riggall & Postle, 2012; Harrison & Tong, 2009; Serences et al., 2009) and from superior parietal cortex over an interval that included task-irrelevant perceptual input (Bettencourt & Xu, 2016), leaving open the question of what happens to (sensory) WM representations in the face of (complex) incoming sensory signals that require attention. Of course, the information must be represented somehow, because it is retrieved after completion of the search task, but these data suggest that a visual representational format can still be employed, even during a concurrent visual search task.
Multivariate decoding of the WM category was consistent with expectations, but these results bear several further considerations. For one, WM classification in this study was performed at the category level, rather than the finer-grained level of specific exemplars. Our decoding analysis could therefore be interpreted to reflect a more abstract representation of the current task, rather than the particular WM sample, per se. Recent studies, however, support the notion that classifier evidence in sensory and category-responsive regions does indeed convey item-specific information (LaRocque et al., 2016; Rose et al., 2016). A task with more abstract stimulus representation demands might be expected to influence activity patterns in more dorsal and anterior brain regions, as opposed to visual regions (Christophel, Klink, Spitzer, Roelfsema, & Haynes, 2017). Moreover, this study used all novel WM stimuli, whereas WM capacity and representational format may be dramatically impacted by stimulus familiarity and real-world relevance (Brady, Störmer, & Alvarez, 2016; Endress & Potter, 2014). Regardless, the fact that category decoding is influenced by visual search difficulty serves as evidence that information about WM representation (whether it be abstract or specific) is affected by simultaneous attentional load. Future investigations should probe the specificity of distributed WM representations and how they are influenced by factors like stimulus abstraction and novelty.
Although many prior studies of WM decoding have used a retro-cue procedure to differentiate perceptual from maintenance activity, here, the contributions of residual perceptual activity are primarily abated by decoding WM category information during a secondary perceptual task. That is, any perceptual activity related to the WM sample is unlikely to persist across exposure to a series of additional perceptual stimuli (and a delay of 12.5–15 sec), supporting our interpretation that the decoded category information is related to WM maintenance. Several previous studies have already established that WM category maintenance activity for faces and scenes can be decoded from extrastriate visual regions (Lorenc, Lee, Chen, & D'Esposito, 2015; Sreenivasan, Vytlacil, et al., 2014). Our goal was instead to examine whether such activity patterns are sensitive to visual attentional task demands, whereby any potential confounds due to perceptual bleed-over would affect all task conditions equally, because secondary perceptual input was equated in all trial types. An important question for future research will be to determine how representations of WM content are influenced by other demands, such as distraction from perceptually similar stimuli (cf. Gayet et al., 2017; Soto, Humphreys, & Rotshtein, 2007; Yoon, Curtis, & D'Esposito, 2006; Postle, 2005; Jha, Fabian, & Aguirre, 2004).
A critical function of WM is to maintain information in the face of competing demands, yet surprisingly, little is known about how such attentional demands interact with WM storage. The present findings suggest that attention is necessary to maintain detectable visual WM category representations in sensory areas, but those distributed activity patterns must not correspond to the sole functional substrate of WM maintenance (Derrfuss et al., 2017; Lee & Baker, 2016; Bettencourt & Xu, 2016; Ester et al., 2015), because the material can still be remembered when WM decoding falls to chance. Thus, although the quality of sensory multivariate evidence for a WM item has recently been taken to reflect the precision of the WM representation (Emrich et al., 2013; Ester et al., 2013), we must explore additional maintenance formats to fully understand how we are best able to juggle our internal goals with persistent concurrent demands for our attention.
We thank Jiefeng Jiang and Phil Kragel for help in the analysis. This research was supported in part by National Institute of Mental Health Award R01MH087610 to T. E.
Reprint requests should be sent to Tobias Egner, Department of Psychology and Neuroscience and Center for Cognitive Neuroscience, Duke University, Durham, NC 27708, or via e-mail: firstname.lastname@example.org.
These authors contributed equally to this work.