Visual STM of simple features is achieved through interactions between retinotopic visual cortex and a set of frontal and parietal regions. In the present fMRI study, we investigated effective connectivity between central nodes in this network during the different task epochs of a modified delayed orientation discrimination task. Our univariate analyses demonstrate that the inferior frontal junction (IFJ) is preferentially involved in memory encoding, whereas activity in the putative FEFs and anterior intraparietal sulcus (aIPS) remains elevated throughout periods of memory maintenance. We have earlier reported, using the same task, that areas in visual cortex sustain information about task-relevant stimulus properties during delay intervals [Sneve, M. H., Alnæs, D., Endestad, T., Greenlee, M. W., & Magnussen, S. Visual short-term memory: Activity supporting encoding and maintenance in retinotopic visual cortex. Neuroimage, 63, 166–178, 2012]. To elucidate the temporal dynamics of the IFJ-FEF-aIPS-visual cortex network during memory operations, we estimated Granger causality effects between these regions with fMRI data representing memory encoding/maintenance as well as during memory retrieval. We also investigated a set of control conditions involving active processing of stimuli not associated with a memory task and passive viewing. In line with the developing understanding of IFJ as a region critical for control processes with a possible initiating role in visual STM operations, we observed influence from IFJ to FEF and aIPS during memory encoding. Furthermore, FEF predicted activity in a set of higher-order visual areas during memory retrieval, a finding consistent with its suggested role in top–down biasing of sensory cortex.
Visual STM (VSTM) is accomplished through the recruitment of visual and frontoparietal areas in the human brain (Gazzaley & Nobre, 2012; D'Esposito, 2007). Recently, it has been demonstrated that parts of visual cortex involved in the sensory encoding of stimuli to be remembered retain stimulus information throughout prolonged retention intervals during which no external cues are available (Harrison & Tong, 2009; Serences, Ester, Vogel, & Awh, 2009). Inferior frontal junction (IFJ), an area situated in the frontal lobe at the intersection of the inferior frontal sulcus and precentral sulcus, was recently found to modulate visual cortex in a top–down fashion during memory encoding (Zanto, Rubens, Thangavel, & Gazzaley, 2011) and has generally been associated with cognitive control and memory updating operations (Brass, Derrfuss, Forstmann, & von Cramon, 2005). Following memory encoding, two areas outside of visual cortex, the FEFs and intraparietal sulcus (IPS), have been reported to show sustained activity during periods of memory maintenance (Srimal & Curtis, 2008; Curtis & D'Esposito, 2006). Most studies finding such activity patterns in FEF and IPS have used spatial working memory tasks where participants maintain information about locations in space during memory intervals (for a review, see Curtis, 2006). FEF has, however, also been associated with maintenance of memory traces when the spatial component is irrelevant to perform the task and has thus been suggested to play a more general role in VSTM (Offen, Gardner, Schluppeck, & Heeger, 2010). In the latter study, no sustained activity was observed in IPS during memory intervals. The study did not, however, separate between anterior and posterior parts of the IPS (aIPS and pIPS, respectively), and some findings suggest that the two parts differ in their involvement in memory tasks not probing spatial memory (Xu & Chun, 2006).
The interactions between visual cortex, IFJ, and FEF/IPS during VSTM have not been studied thoroughly. In the current fMRI study, we have investigated effective connectivity between the nodes in the delineated structural model during different VSTM task operations. Participants were engaged in a delayed orientation discrimination task requiring encoding, maintenance, and retrieval of low-level visual information. Additional control tasks were also included, involving active processing of stimuli not part of the memory task and passive viewing of identical stimuli. To investigate temporal patterns of influence, we applied Granger causality (GC) analysis (Roebroeck, Formisano, & Goebel, 2005) in an event-related fashion on periods representing the different task requirements of the experiment. We hypothesized that IFJ would be highly engaged during memory encoding operations and, in line with recent evidence, have an initiating effect on regions involved in memory maintenance of task-relevant representations. Furthermore, FEF and IPS can have direct modulatory effects on visual cortex during periods of task expectancy (Bressler, Tang, Sylvester, Shulman, & Corbetta, 2008) and visual processing during discrimination (Moore & Armstrong, 2003). We thus hypothesized that these frontoparietal regions would interact with visual cortex during the discrimination phases of the experiment.
Participants and Scanning Sessions
This study is based on a data set that was previously used to assess memory encoding in retinotopic visual cortex using univariate analyses (Sneve, Alnæs, Endestad, Greenlee, & Magnussen, 2012). Six trained participants (age = 21–28 years; one woman) gave informed written consent to participate in the study, which was approved by the Regional Committee for Medical and Health Research Ethics (South-East Norway). After two days of training on the task in a standard laboratory setting, task discrimination thresholds were estimated for each participant in the scanner. The main experiments comprised seven scanning sessions per participant, distributed over several days. Finally, each participant participated in two sessions of retinotopic mapping and ROI localization.
Stimuli and Experimental Procedures
All stimuli were generated using the Psychophysics toolbox for Matlab (Brainard, 1997). Visual stimuli were projected onto a screen inside the scanner at a resolution of 1400 × 1050. Participants viewed the screen through a mirror attached to the head coil and responded using the ResponseGrip system (NordicNeuroLab, Bergen, Norway). Auditory stimuli were presented to the participants' headphones through the scanner intercom.
For all tasks, visual stimuli were sinusoidal grating annuli with an inner radius of 1° and a maximum Michelson's contrast of 0.6, centrally presented on a gray screen around a white fixation dot (0.1° in diameter). The gratings were convolved with a 2-D Gaussian kernel and reached half maximum contrast at a radius of 3.3° from center. The transition between the central gray area and the stimulus pattern was smoothed to avoid cues about stimulus orientation from sharp edges. Grating orientation in a trial was either 45° or 135° (jittered with a random value in the range ±5°), and the spatial frequency could be 1.5 or 3 (±0.1) cycles per degree. The jittering, as well as random shifts in the phase of the sinusoid across trials, was applied to prevent build-up of long-term representations of the stimuli and to discourage participants from using categorical encoding strategies (Lages & Treisman, 1998). The different spatial frequency/orientation combinations were counterbalanced and presented in a randomized order across the experiment.
The experimental structure is illustrated in Figure 1A. At the start of each trial, the appearance of the fixation mark signaled the upcoming sample stimulus. Participants were instructed to fixate on this dot as long as it was present. After a 1.5-sec fixation period, the sample was presented for 0.5 sec, and the participant's task was to remember the precise orientation of its grating pattern. Following the sample, there was a delay interval of varying length in which only the fixation mark was present (6–14 sec, exponential distribution over five discrete intervals, mean duration = 7.9 sec, STD = 2.4 sec). The delay interval ended with the presentation of a tone cue (500 or 1000 Hz; duration = 0.25 sec), which in 75% of the trials signaled that the upcoming task would require a comparison of the test stimulus and the memory representation of the sample (memory-based discrimination). In 25% of the trials, the tone instructed the participant to prepare for a difficult orientation–discrimination task for which the remembered information was irrelevant (stimulus-based discrimination). The cueing roles of the two tones were counterbalanced across participants. After the tone cue, there was a delay of 1–3 sec (uniform distribution) before the cued test stimulus appeared for 0.5 sec (unless it was a partial trial, see below). In memory-based discrimination trials, the test stimulus was identical to the sample grating, except for a slight clockwise or counterclockwise shift in orientation, corresponding to the individually estimated 75% memory discrimination threshold. In stimulus-based discrimination trials, the spatial frequency of the test stimulus was identical to the sample grating, but its orientation would now be shifted clockwise or counterclockwise relative to vertical. This shift corresponded to the individually estimated 75% stimulus discrimination threshold, meaning that the two tasks were approximately similar in their difficulty. Both directions of change (clockwise/counterclockwise relative to sample orientation or relative to vertical) occurred with equal probability, and participants indicated the direction of this shift with a button press (two-alternative forced-choice). In 25% of the trials, no test stimulus followed after the tone cue. These partial trials were included to allow the disentangling of the fMRI responses to the tones from the responses to the test stimuli (Ollinger, Shulman, & Corbetta, 2001). The disappearance of the fixation point signaled the end of a trial. After the offset of the test stimulus, there was an intertrial interval of minimum 12 sec to allow the hemodynamic response to return to baseline.
Each experimental run started and ended with a baseline measurement period (20 sec) and consisted of 32 trials (duration of a run = 13 min). All participants completed 14 runs of the experiment; thus, the total number of trials per participant was 448. All trial types (full/partial, memory/stimulus discrimination) and delay durations were present in a run, and were presented in a quasi-randomized order to ensure that participants never knew in advance when a trial would end or what kind of trial it would be. One run thus consisted of 32 presentations of the sample, 18 presentations of the memory test, and 6 presentations of the discrimination test stimulus. In addition to the main experiment, all participants completed three runs of a passive viewing task in which they were instructed to attend to the stimuli without performing any tasks. Passive viewing trials were identical to the main experiment trials, with the exception that the tone cues did not have any predictive value (either test stimulus could follow after a cue), and the different cue types/test stimuli occurred with equal probability. Furthermore, to ensure that the participants did not silently perform any task on the “test” gratings, the “memory test” stimulus had the same orientation as the “sample” in a trial, and the “discrimination test” stimulus was always vertical.
Individual 75% discrimination thresholds were estimated separately for the two tasks in the experiment, using an adaptive maximum likelihood procedure (QUEST; Watson & Pelli, 1983). The trial structure used to probe memory-based discrimination thresholds was similar to the main experiment, but the sample–cue interval was held constant at 7.9 sec (corresponding to the mean duration of this interval in the main experiment). To estimate stimulus-based discrimination thresholds, the trial structure was modified to only contain tone cue and test stimulus. For both tasks, the QUEST algorithm varied the orientation of the test stimulus around a reference orientation (sample orientation in a trial/vertical), aiming for the difference that produced discrimination accuracies of 75%. Individual average threshold levels (estimated from 3 × 32 trials on each of the tasks) were used as orientation differences in the main experiment.
Imaging was performed with a Philips Achieva 3 T whole-body MR unit equipped with an eight-channel Philips SENSE head coil (Philips Medical Systems, Best, the Netherlands). The functional imaging parameters were equivalent across all scanning sessions: 28 transversally oriented slices (no gap) were measured using a BOLD-sensitive T2*-weighted EPI sequence (repetition time [TR] = 1500 msec, echo time = 35 msec, flip angle = 74°, voxel size = 3 × 3 × 3 mm, field of view = 192 × 192 mm, interleaved acquisition). The slices were oriented to cover occipital cortex, parietal cortex, and dorsal frontal cortex. At the start of each fMRI run, six dummy volumes were collected to avoid T1 saturation effects in the analyzed data. Anatomical T1-weighted images consisting of 180 sagittally oriented slices were obtained using a turbo field echo pulse sequence (TR = 8.125 msec, echo time = 3.72 msec, flip angle = 8°, voxel size = 1 × 1 × 1 mm, field of view = 256 × 256 mm).
Imaging data were preprocessed using BrainVoyager QX software (version 2.3, Brain Innovation, Maastricht, The Netherlands). For each participant, a set of T1-weighted images were corrected for spatial intensity inhomogenities, coregistered, and averaged together to produce a single high-resolution anatomical volume. White-gray matter boundaries of these volumes were estimated, and bridges were removed. On the basis of the white matter segment of each hemisphere, 3-D meshes of the cortical surfaces were created. The functional images were slice timing corrected to the first slice in a volume using windowed sinc interpolation routines. Head motion after the first volume in a scan was detected using linear interpolation and corrected by iterative adjustments of rigid body parameters using a nonlinear least-squares optimization algorithm (Levenberg–Marquardt) and windowed sinc interpolation. Finally, each functional scan was coregistered (rigid body) against the individual whole-brain anatomical volume using a normalized gradient field algorithm. Because ROIs were precisely localized for each participant, no spatial smoothing was applied, nor was the imaging data transformed into normalized space.
Standard retinotopic mapping routines were used to delineate borders of individual visual areas in early (V1, V2, V3, V4) as well as late (V3a/b, LO1, LO2) visual cortex (Wandell, Dumoulin, & Brewer, 2007). Voxels responsive to the items presented in the main experiment were localized using a flickering (10 Hz) radial checkerboard annulus of similar configuration as the high-contrast parts of the grating stimuli (inner radius of 1°, outer radius of 3.3°). The localizer stimulus was presented 8 × 15 sec within a run (three runs per participant) and was interleaved with 15-sec rest periods. A general linear model (GLM) was set up with a regressor representing the expected hemodynamic response function (HRF; canonical two-gamma) to the onset periods of the flickering annulus. Low-frequency drifts were removed using a temporal high-pass filter (0.0125 Hz), and intrinsic autocorrelations were modeled. Within each visual area and hemisphere, voxels resembling the regressor with a significance rate of p < .001 (false discovery rate-corrected) were considered an ROI representing stimulus-responsive parts of that part of retinotopic visual cortex.
To establish ROIs outside of visual cortex, a combined anatomical/functional approach was used. Left and right IFJ, FEF, aIPS, and pIPS were drawn manually on inflated surface reconstructions of each participant's hemispheres (Figure 3). The anatomical ROI representing IFJ encompassed the junction between the posterior inferior frontal sulcus and the inferior precentral sulcus (Derrfuss, Vogt, Fiebach, von Cramon, & Tittgemeyer, 2012). To ensure that our definition of IFJ was precise, we estimated the ROI's centers of gravity in MNI space after registering their individual anatomical volumes to the MNI152 standard brain and transforming the ROIs using the estimated registration parameters. The mean left IFJ location in our study was −44,3,33 (x,y,z), and the mean right IFJ location was 45,1,35. This corresponds well with the average location of left IFJ reported by Derrfuss et al. (2012): −42,6,33. FEF was defined as the intersection between the superior frontal sulcus and the superior precentral sulcus (Offen et al., 2010), excluding the most dorsal parts of the superior precentral sulcus, which have been associated with visuomotor hand conditional activity (Amiez, Kostopoulos, Champod, & Petrides, 2006). Finally, aIPS and pIPS were defined as the anterior and posterior half of the IPS, respectively. The ROIs were further restricted to only contain voxels that responded consistently to the visual stimuli presented in the main experiment. A GLM was set up for each participant separately, modeling the experiment with five regressors, representing onsets of the sample, the two test stimuli, and the tone cues, convolved with canonical HRFs. Within each anatomical ROI, voxels that survived a p < .001 (false discovery rate -corrected) threshold on the conjunction test of main effects of all visual stimuli were kept for further univariate analysis. Note that, although this selection criterion ensured that the ROIs consisted of stimulus-sensitive voxels, it did not impose any biases on the subsequent univariate analyses, which aimed at comparing responses across experimental conditions. Moreover, for the effective connectivity analyses, all ROIs were further restricted to only contain the 25 most stimulus-sensitive voxels, defined as the 25 most significant voxels according to the checkerboard localizer (ROIs in visual cortex) or the conjunction test (ROIs outside of visual cortex).
For the experimental data, intensity time courses from all voxels within an ROI were high-pass filtered (cutoff = 0.008 Hz) and averaged. The resulting single time series from each ROI were then investigated with two univariate GLMs. The first model aimed at finding ROIs showing sustained activity during the maintenance interval and was set up using five regressors representing the stimulus events (sample, test stimuli, tone cues) as well as a “retention regressor” covering the delay period between sample offset and cue onset (6–14 sec). All regressors were convolved with canonical HRFs (time to response peak = 5 sec, time to undershoot peak = 15 sec), and serial correlations were removed using a second-order (AR(2)) model (Monti, 2011). Significance of observed activity in the retention interval was estimated both on an individual participant level and in group analyses, treating the different participants as random samples.
The second model aimed at finding regions with BOLD response patterns characteristic for the involvement in memory encoding. We operationalized this as the observation of stronger responses to the sample stimulus compared with sensorially identical stimuli requiring active processing, but no encoding into a memory representation (i.e., the discrimination test stimuli). This analysis was restricted to the ROIs outside of visual cortex, because we already have investigated the role of visual areas in memory encoding using the same data set (Sneve et al., 2012). A deconvolution model was set up to estimate the actual hemodynamic response produced by the individual stimuli (sample, test stimuli, tone cues) in the main experiment and for the passive viewing runs. Each stimulus event was modeled by 13 stick predictors, one per volume, covering the extent of the expected response. Following correction for serial correlations, the peak amplitudes from the resulting BOLD time series were used as estimates of response strengths to the different stimulus types. Significant differences in peak activity following the different stimuli were established on a group level.
Effective connectivity during the different task epochs was established between the ROIs under investigation using the Granger Causal Connectivity Analysis toolbox for Matlab (Seth, 2010). The Granger approach has been thoroughly described elsewhere (Ding, Chen, & Bressler, 2006; Roebroeck et al., 2005), but in brief: if including the past of time series B helps explaining the present of time series A better than the past of time series A alone does, B is said to “Granger causal” A. One criticism against the use of GC analysis on BOLD data is based on the fact that different brain regions can have different HRFs (Chang, Thomason, & Glover, 2008). Thus, when finding GC effects between BOLD responses in two different brain regions, one cannot know whether it is because of temporal differences in the underlying neuronal activity or spurious effects resulting from regional differences in the vascular conversion of this activity (Friston, 2009; David et al., 2008). A suggested approach to this problem is to always compare GC results over conditions/cognitive contexts, since any observed systematic variation because of experimental intervention should occur on top of the constant structural differences in hemodynamic processes (Bressler & Seth, 2011; Roebroeck, Formisano, & Goebel, 2011). In this study, we therefore performed GC analysis on the different experimental epochs separately and only considered consistent differences in effects across conditions as findings of relevance.
The GC analysis focused on four conditions (Figure 1B): (1) Memory encoding, operationalized as the first six volumes (9 sec) following sample onset in the main experiment; (2) Memory retrieval (the first six volumes following memory test onset); (3) Discrimination only (the first six volumes following discrimination test onset); (4) Passive viewing, defined as the first six volumes following any visual stimulus in the passive viewing experiment. To ensure that all conditions were similar with respect to sensory stimulation, only trials with sample–cue delays of more than 9 sec were included in the definition of the memory encoding and passive viewing conditions. The retrieval and discrimination only conditions included data from the intertrial interval, and no other stimuli were presented during this period. Consequently, all four conditions entering GC analysis were identical in the sense that they consisted of six volumes of BOLD data following 0.5 sec grating stimulation; however, the task performed on the stimulus varied. The number of trials that went into the GC analysis per condition per participant was 224 (Memory encoding), 252 (Memory retrieval), 84 (Discrimination only), and 120 (Passive viewing). Time series representing the four conditions were extracted separately from 10 ROIs in each hemisphere (7 visual areas + IFJ, FEF, and aIPS). pIPS did not show sustained activity during maintenance, nor univariate encoding effects and was therefore not considered a central node in the network of investigation. All time series were mean-centered, detrended, and averaged across voxels in an ROI in a trial-wise fashion. Because, for each condition, we had multiple unconnected repetitions of short duration (9 sec), we considered each occurrence as a realization of a common underlying stochastic process (Seth, 2010; Ding et al., 2006). Accordingly, GC between ROI pairs was first computed for each realization, then averaged across realizations, using a first-order autoregressive model (i.e., including observations one TR back in time; Wen, Yao, Liu, & Ding, 2012; Bressler et al., 2008). Additional analyses using AR(2) models did not significantly alter the results. Although this analysis produces bidirectional interactions (i.e., influence of Region 1 on Region 2 as well as influence of Region 2 on Region 1), we chose to focus on the difference of these scores, which is argued to be a better approximation to true directions of influence when GC analysis is applied to fMRI data (Kayser, Sun, & D'Esposito, 2009; Roebroeck et al., 2005).
We computed “difference of influence” GC values between pairs of ROIs separately for each participant and condition. Specifically, we examined connections between IFJ, FEF, and aIPS within the same hemisphere as well as between these higher-order ROIs and ipsilateral/contralateral visual cortex. The observed GC values were tested for significance on a group level and had to fulfill two criteria to be considered a positive finding: (1) GC values had to be different from zero and (2) GC values had to be different from the values observed in the other conditions. After adjusting the alpha level because of multiple comparisons (30 connections; corrected alpha = .0017), bootstrap resampling of the participants' GC values was used to calculate confidence intervals around the sample means and assess significance. One thousand bootstraps were run per connection.
Next, to investigate whether any of the observed influences were of behavioral significance, we divided each participant's data set into “error trials” and “fast correct trials.” Error trials were picked from the subset of trials with sample–cue delays of 9 sec or more and included all memory test trials in which the participant produced a wrong response (31.5 trials on average). Fast correct trials were defined as memory test trials in which the participant produced a correct response and where the RTs were sufficiently low (defined as the n shortest RTs a participant produced on correct response trials, with n being equal to the number of error trials). The speed criterion was included to account for the fact that because of the experiment's 2AFC response structure, participants would produce on average 50% correct responses even when guessing. In general, participants were faster at responding during correct trials (M = 800 msec) than during error trials (M = 926 msec; t(5) = 4.55; p = .006), thus RTs should be shorter during nonguessing trials.
Following pairwise GC analysis, we performed conditional GC analysis on a selection of ROIs and conditions. A problem with pairwise GC is that observed effects between two ROIs in reality can be caused by a third region. For example, if region three first influences Region 1 and after a short delay influences Region 2, this can be observed as a direct, but nonexistent, influence from Region 1 to Region 2 (Ding et al., 2006). Conditional GC analysis involves incorporating a third region (or several) in the computation, accounting for the variance explained by its past when calculating the influence between the two primary ROIs. Note that the difference of influence term was calculated on conditioned GC values in these analyses (following the procedure of Kayser et al., 2009; see also Kayser, Erickson, Buchsbaum, & D'Esposito, 2010).
Finally, we also investigated functional (i.e., nondirectional) connectivity between a set of nodes in the VSTM network. Within the GC framework, instantaneous influence corresponds to the improvement in predicting the present value of time series A by including the present value of time series B in a linear model already containing the past values of A and B (see, e.g., Appendix A in Roebroeck et al., 2005 or Equation 1-9 in Ding et al., 2006). The instantaneous influence measure is simply a quantification of the remaining linear (nonlagged) dependence between two time series after accounting for linear (lagged) influences between and within the time series. Thus, the instantaneous influence term essentially reflects residual correlations that cannot be assigned a direction based on the temporal information in the data. Although we recorded data at a relatively high fMRI sampling rate (1.5 sec), and it has been demonstrated that GC analysis can detect sub-TR neural influences (Roebroeck et al., 2005), sensitivity to very brief interactions is limited. We therefore applied instantaneous influence analysis on connections not showing GC results consistent with known neural influences (i.e., IFJ to visual cortex, which has be demonstrated to occur within 200 msec after stimulus onset; Zanto et al., 2011). The values estimated from the conditional GC analysis and the instantaneous influence analysis were evaluated for significance in the same fashion as for the bivariate GC analyses.
The behavioral results have been reported in detail elsewhere (Sneve et al., 2012). In summary, average accuracy was 75.7% (SD = 4.1) on the memory discrimination task and 70.7% (SD = 2.7) on the stimulus discrimination task, indicating that the individually estimated 75% thresholds were appropriate. The delay interval length did not have any effect on accuracy on any of the tasks (p > .34). Note that in the previous study we also showed that activity in visual areas drops to baseline during long retention intervals. Furthermore, we demonstrated how a set of late visual areas (V3a/b and LO1/2) are significantly more active during memory encoding compared with other tasks performed on identical stimuli. These analyses will not be repeated here, but the univariate approach used was similar to the procedures described below.
Univariate Analyses—Memory Encoding
We first investigated how the areas outside of visual cortex responded to visual stimuli in the passively viewed version of the experiment. A repeated-measures ANOVA with Bilateral ROIs and Stimulus Type (i.e., corresponding role of the stimulus in the main experiment) as factors, found no main effect of Stimulus Type, F(2, 10) = 3.11; p = .09, nor any interaction, F(14, 70) = .914; p = .55. Although we saw a main effect of ROI, F(7, 35) = 4.24, p = .002, indicating that absolute percentage signal change varies between the different areas, the lack of effects of stimulus type suggests that the passively viewed stimuli were processed in a similar fashion, independent of position in the trial sequence. We therefore collapsed data across stimulus types and tested whether passive viewing of stimuli produced responses above baseline levels in the higher-level brain regions. A set of one-sample t tests against zero found significant bilateral responses in all ROIs (t(5) > 3.31, p < .021), demonstrating that IFJ, FEF, and IPS respond to visual stimuli, also when an explicit task is absent. Note that this observation could not be because of the manner in which ROIs were defined, because the relevant contrast only included data from the main experiment.
To test the hypothesis that IFJ is involved in memory encoding, we compared the responses produced by different conditions in the main experiment (Figure 2). Peak values (normalized to percent signal change) were extracted from the estimated deconvolved responses and submitted to a repeated-measures ANOVA with two factors (Condition and ROI). Both main effects were significant (ROI: F(7, 35) = 8.45, p < .001; Condition: F(2, 10) = 12.37, p = .002), indicating that absolute responses differ between areas and depend on the task associated with the grating stimulus. We also observed a significant interaction, F(14, 70) = 8.69, p < .001, and pursued this effect with a set of planned comparisons (see Methods). When comparing the sample response (i.e., memory encoding condition) with the response to the discrimination test stimuli (i.e., active processing, but no memory task), we found a significantly stronger response during memory encoding in both left and right IFJ (t(5) > 2.97, p < .016), as well as in right FEF (p = .033), but not in any of the other ROIs investigated (paired samples t tests, one-tailed p values). This effect was also observed when contrasting the same conditions' parameter estimates from the model investigating the maintenance phase, in which all events, including the maintenance period, were modeled. Thus, the memory encoding effect in IFJ also holds when potential maintenance-related activity is accounted for. To allow comparison with the encoding effects observed in visual cortex (Sneve et al., 2012), the results from visual area LO2 have also been depicted in Figure 2.
Univariate Analyses—Maintenance Phase
Figure 3 (bottom) shows examples of event-related averages of activity following sample onset in a frontal and a parietal ROI. To investigate if there were differences in delay period activity between the ROIs of interest, the estimated scaling parameters (beta values) for the maintenance regressor were submitted to a repeated-measures ANOVA with one factor (ROI). There was a significant main effect of ROI, F(7, 35) = 8.46, p < .00001, indicating that activity levels differ between the investigated ROIs during memory maintenance. In line with earlier studies, we observed sustained activity throughout the maintenance interval in FEF (Riggall & Postle, 2012; Offen et al., 2010). On a group level, this was significant in both hemispheres (t(5) > 3.07; p < .028). Furthermore, five of six participants demonstrated significant delay activity above baseline in bilateral FEF. Similar patterns of activity were observed in bilateral aIPS, both on a group level (p < .013), and in five of six participants. The same five participants showed sustained activity in both ROIs. Neither IFJ nor pIPS demonstrated consistent delay period activity on a group level (p > .47 and p > .11, respectively), and single-participant analyses found significant bilateral activity in only two of six (IFJ) and three of six (pIPS) participants.
On the basis of the univariate findings, which suggested a role of IFJ in memory encoding and FEF/aIPS in maintenance processes during the delay interval, we included these bilateral ROIs as higher-order nodes in the VSTM network undergoing GC analysis. To restrict the number of tests performed during inferences on the GC results, we further reduced the structural model by collapsing the seven bilateral nodes in visual cortex into two: early visual cortex (bilateral V1, V2, V3, and V4), and late visual cortex (bilateral V3a/b, LO1, and LO2). We kept early and late areas separate since earlier, with the same data set, we documented activity modulation during memory encoding in late visual areas, but not in early visual areas (Sneve et al., 2012). To justify averaging within early visual cortex, we ran a set of repeated-measures ANOVAs to test whether the collapsed areas were differentially influenced over the different tasks in the experiment. The model testing for variations between areas had three factors: Early Visual Areas (V1–V4), Experimental Conditions (memory encoding, memory retrieval, discrimination only, and passive viewing), and Higher-level Nodes (IFJ, FEF, aIPS). The estimated GC values between left/right early visual areas and contralateral/ipsilateral Higher-level Nodes were analyzed separately. No differences were found between Early Visual Areas across conditions (i.e., no significant two-way interaction between Visual Area and Condition, F(9, 45) < 1.75, p > .10, nor any three-way interaction, F(18, 90) < 1.11, p > .36), indicating that V1–V4 influence/are influenced by the higher-level nodes in a similar fashion. A corresponding analysis was performed on the late visual areas, also without finding differential patterns of influence (no two-way interaction [p > .57], no three-way interaction [p > .76]). Hence, for each participant, the estimated GC values representing interactions with early/late visual areas were averaged into representing early and late visual cortex, respectively.
The final structural model thus consisted of five nodes in each hemisphere. We investigated difference of influence values from the pairwise GC analysis between visual nodes in one hemisphere and contralateral/ipsilateral frontoparietal nodes as well as within-hemispheric interactions between the frontoparietal nodes (30 connections in total). The main results from the pairwise GC analysis are shown in Figure 4A–C. Significant influence was observed in both hemispheres from IFJ to FEF and aIPS during memory encoding, and this pattern of influence was absent (FEF) or significantly weaker (aIPS) during the other task operations (Figure 4A). No differential effects across conditions were observed in the interactions between IFJ and contralateral/ipsilateral visual cortex; thus, we cannot prove that the observed effects (i.e., GC values different from zero) are because of real neuronal interactions and not because of differences in hemodynamics between areas. When investigating the instantaneous (i.e., nondirectional) influence between IFJ and visual cortex, however, we observed significantly stronger functional connectivity during memory encoding compared with the other conditions between left IFJ and late visual cortex bilaterally (Figure 4D). Similar but less consistent effects were observed between right IFJ and early/late visual cortex (Figure 4E), as well as between left IFJ and early visual areas.
When investigating GC influences to and from FEF, two contrasting patterns were evident: in a similar fashion as IFJ, early and late visual cortex appeared to drive ipsilateral/contralateral FEF in a bottom–up manner during memory encoding (Figure 4B). During memory retrieval, the direction of influence reversed, as FEF during this task period predicted activity in contralateral late visual cortex. This effect was significantly stronger than during passive viewing of identical stimuli and was found in both cross-hemispheric connections. We could not distinguish the effective connectivity pattern during memory retrieval from that observed during the discrimination only condition. The GC effects in the latter condition were however absent (i.e., not different from zero). Finally, we also observed apparent bottom–up effects between late visual cortex and contralateral/ipsilateral aIPS during memory encoding, but no differences between GC values estimated from other task operations (Figure 4C).
The conditional GC analyses were motivated by the finding from the pairwise GC analyses that both IFJ and visual cortical ROIs influenced FEF and aIPS during memory encoding (Figure 5A). To test the possibility that the bottom–up effects from visual cortex in reality reflect processes initiated by IFJ (e.g., IFJ could influence visual and frontoparietal cortex with different delays; see Methods), we reestimated the GC values between these areas during memory encoding, while conditioning on IFJ (Figure 5B). This analysis, which accounted for the variance explained by IFJ's past, rendered the influence from visual cortex to FEF and aIPS nonsignificant (i.e., not above zero) in both hemispheres, indicating that within the current structural model, activity in FEF and aIPS during memory encoding is predominantly explained by earlier events in IFJ. This interpretation was further supported by a second analysis, in which we calculated GC values between the higher-level nodes while conditioning on visual cortex (Figure 5C). Here, the influence from IFJ to FEF and aIPS remained significant (i.e., above zero) in both hemispheres. The results from the conditional analyses thus suggest that the influences from IFJ to FEF and aIPS are direct and not mediated by visual cortex.
Finally, when comparing GC values during memory encoding estimated from a subset of “high-performance trials” in which participants produced accurate and rapid responses on the succeeding memory discrimination task, with “low-performance trials” characterized by inaccurate memory discrimination, we observed significantly stronger influence from IFJ to FEF in the left hemisphere during fast correct trials (Figure 4F). This pattern was not observed between IFJ and FEF in the right hemisphere nor in any other connection in the investigated VSTM network.
The results of this study indicate that IFJ is a central initiating node in VSTM encoding. GC analyses of the period following a stimulus to be remembered demonstrated that IFJ predicts activity in FEF and aIPS, areas with persistent levels of activity during memory maintenance intervals. Univariate analyses further confirm that IFJ is strongly involved during memory encoding. Moreover, during memory retrieval, FEF influences a set of contralateral late retinotopic visual areas, suggestive of a role in reinstating sensory aspects of the remembered stimulus when it becomes relevant for behavior.
Several studies have found involvement of IFJ in VSTM encoding (Todd, Han, Harrison, & Marois, 2011; Bollinger, Rubens, Zanto, & Gazzaley, 2010; Roth, Serences, & Courtney, 2006). Recently, Zanto et al. (2011) demonstrated that repetitive TMS over IFJ led to poorer performance on a subsequent delayed-recognition task, which, as in this study, involved remembering simple visual feature information. The behavioral impairment had a neural correlate in the P1 component of the ERP produced over posterior electrodes during memory encoding, which was weakened following TMS to IFJ. This observation suggests that IFJ exercises top–down control over perceptual areas to promote the successful establishment of task-relevant representations, perhaps by modulating feature processing (Zanto, Rubens, Bollinger, & Gazzaley, 2010) and/or object information (Lepsien & Nobre, 2007). In line with the postulated role of IFJ in establishing memory representations, univariate analyses showed that activity levels in IFJ are higher during memory encoding compared with other tasks performed on similar stimuli. This differential pattern of activity resembles the responses produced by a set of late visual areas (V3a/b, LO1-2; Sneve et al., 2012). However, we do not find any evidence for stronger GC influence between IFJ and these areas during memory encoding compared with other operations on similar stimuli. One explanation for this finding may be that the sampling interval in the current study (1.5 sec) was too high to detect influences from IFJ and sensory areas, which take place within 200 msec after stimulus onset (Zanto et al., 2011). Although GC analysis of fMRI signals can detect interactions between neural populations with temporal extents and delays smaller than the sampling resolution (Roebroeck et al., 2005), the sensitivity of the analysis decreases with increased TR. The observation of increased instantaneous influence (i.e., nondirectional functional connectivity) between IFJ and visual cortex during memory encoding suggest that influence might be present on the neural level, but at a too fine timescale to be reflected in the fMRI signal. Alternatively, influences from IFJ to visual cortex could be present during all operations investigated; IFJ has been implicated in several tasks requiring updating and representation of task rules (Brass et al., 2005), and the test phase of the present experiment involved adapting to the upcoming cued task.
Nevertheless, IFJ's influence on FEF and aIPS during the memory encoding phase fits well with the univariate patterns of activity, suggesting a primary role of IFJ in encoding, and FEF/aIPS in maintenance-related processes. To our knowledge, we are the first to investigate effective connectivity between these areas during VSTM operations, although they are commonly reported as central nodes in VSTM networks (Todd et al., 2011; Zanto et al., 2011; Offen et al., 2010; Linden, 2007; Pessoa, Gutierrez, Bandettini, & Ungerleider, 2002). We suggest that the influence of IFJ on FEF and aIPS reflects top–down triggering of maintenance operations in these areas. The correspondence between degree of influence from IFJ to FEF in the left hemisphere during memory encoding and later success on the memory discrimination task points to a functional significance of these top–down signals. We do not find similar effects in the right hemisphere, which could seem to be in conflict with the findings from Zanto et al. (2011), who found impaired color encoding after repetitive TMS to the right IFJ. However, in the same study, the researchers observed recruitment of IFJ bilaterally when a different feature (motion) was encoded into memory, and other groups have reported involvement of mainly the left IFJ during encoding of yet other types of visual stimuli (Todd et al., 2011).
What is the nature of the representations being maintained in FEF and aIPS? The most frequent reports of sustained activity in FEF and IPS during memory intervals come from studies investigating VSTM tasks with spatial requirements, such as maintenance of saccade goals (Curtis & D'Esposito, 2006; Bruce & Goldberg, 1985) or spatial memory tasks (Armstrong, Chang, & Moore, 2009; Srimal & Curtis, 2008; Kastner et al., 2007). These observations fit well with the finding that FEF contains maps of visual space in retinotopic coordinates (Bruce & Goldberg, 1985) and microstimulation of FEF neurons biases processing of objects presented in the neurons' preferred locations (Moore & Fallah, 2001), most likely through interactions with visual cortex (Moore & Armstrong, 2003). aIPS has similar characteristics (Bressler et al., 2008; Swisher, Halko, Merabet, McMains, & Somers, 2007), although it is more strongly influenced by bottom–up visual input than FEF (Ruff et al., 2008). Activity levels in FEF, however, persist throughout memory intervals also in tasks not probing spatial location information (Riggall & Postle, 2012; Offen et al., 2010), and in this study, we have confirmed that this also is the case for aIPS (Xu & Chun, 2006; Pessoa et al., 2002). A recent study investigating the effect of parametrical reductions of memory load during retention periods found corresponding activity modulations in FEF and aIPS (Lepsien, Thornton, & Nobre, 2011). Importantly, the memoranda in this study were not associated with unique spatial information, indicating that the load-related activity changes reflected reduced rehearsal demands along a nonspatial stimulus dimension (see also Xu & Chun, 2006; their Experiment 4). Thus, although a prevailing view on FEF and IPS function in VSTM maintenance has been related to rehearsal operations in space (e.g., Curtis, 2006), this mode of functioning may be limited to spatial VSTM tasks.
The recent discovery that featural VSTM recruits retinotopic visual cortex throughout memory intervals (Harrison & Tong, 2009) suggests persistent feedback from higher-order areas. One mechanism that could be underlying such maintenance processes is feature-based attention, the selective enhancement of responses from neurons preferring an attended feature (Maunsell & Treue, 2006). Using a pattern classification analysis approach, we have confirmed with the current task that representations of the task-relevant feature (sample orientation in a trial) are held active in visual areas during the memory delay interval (Sneve et al., 2012). Feature-based attention is known to strengthen orientation representations in visual cortex (Liu, Larsson, & Carrasco, 2007), and its effects are separable from the general enhancement resulting from allocation of spatial attention to a part of the visual field (Jehee, Brady, & Tong, 2011). In a recent study, Liu, Hospadaruk, Zhu, and Gardner (2011) demonstrated how feature-based attention to low-level aspects of visual stimuli involved FEF and IPS, in addition to retinotopic visual areas. Moreover, using multivariate pattern analysis techniques, they were able to decode the characteristics of the attended feature from these frontoparietal areas, indicating that FEF and IPS may contain priority maps of visual features in addition to locations (Bollinger et al., 2010; see also Serences & Boynton, 2007). Recent models of VSTM have focused on the similarities between selective attention operations on perceptual input and processes associated with the maintenance of internal representations, suggesting that they are subserved by the same neural mechanisms (Gazzaley & Nobre, 2012; Chun & Johnson, 2011). In line with this proposal, we suggest that FEF and aIPS, areas showing sustained activity during memory delay intervals in this study, are probable sources for top–down feature-specific maintenance signals to visual cortex when VSTM representations are to be kept alive for extended periods of time (see also Kuo, Yeh, Chen, & D'Esposito, 2011).
The observed influence from FEF to late visual cortex during memory retrieval resonates well with the finding that stimulation of FEF during difficult detection tasks leads to facilitated stimulus processing (Ruff et al., 2006; Grosbras & Paus, 2002, 2003; Moore & Fallah, 2001). Several studies have found spatially specific modulation of neural responses in visual cortex following FEF stimulation (Ekstrom, Roelfsema, Arsenault, Bonmassar, & Vanduffel, 2008; Ruff et al., 2008; Taylor, Nobre, & Rushworth, 2007; Moore & Armstrong, 2003), indicating that FEF can enhance sensitivity to visual input through top–down interactions with the relevant sensory populations. Using fMRI and a GC analysis approach, Bressler et al. (2008) demonstrated how expectation to an upcoming, task-relevant oriented grating was associated with top–down influences from FEF and IPS to visual areas and particularly with areas late in the processing hierarchy. Such expectancy-generated effects could explain why we observe influence to late visual cortex, but not V1–V4 in the retrieval condition. In our task, the tone cue presented immediately before memory discrimination may initiate allocation of attentional resources via FEF to the relevant sensory neurons in late visual cortex. It has been shown that early visual areas are modulated by FEF only when they receive concurrent visual input, whereas higher-order visual areas are influenced independent of the presence of visual stimuli (Ekstrom et al., 2008). Initiation of retrieval at auditory cueing should thus primarily recruit late visual cortex. Similar observations can explain why we do not find top–down effects from aIPS during retrieval, as IPS affects higher-order visual areas only during visual stimulation (Ruff et al., 2008). We do not find significant differences in top–down influence from FEF between memory retrieval and the discrimination only condition; thus, we cannot conclude whether FEF's role in retrieval is to boost remembered representations or to facilitate processing of the memory test stimulus by increasing sensitivity in appropriate parts of the visual field. However, if the latter were the case, we would have expected similar consistent patterns of influence in the discrimination only condition, which also would have benefitted from spatial prioritization.
We are aware of the current debate surrounding the use of lag-based methods to infer effective connectivity from fMRI data (e.g., Smith et al., 2011). Recent simulation studies have however demonstrated that, when significant GC influences are observed on a group level, these effects are seldom false positives (Schippers, Renken, & Keysers, 2011; see also Deshpande & Hu, 2012). Furthermore, in this study, we only draw conclusions from GC results that differ significantly from the “passive viewing” condition. The hemodynamic transfer function is known to be stable within a region across experimental contexts (Miezin, Maccotta, Ollinger, Petersen, & Buckner, 2000). Within-region differences in BOLD latencies across conditions should therefore reflect changes in the temporal characteristics of the underlying neural processes. We acknowledge that our study may contain false negatives, because of the low sensitivity of fMRI-GC analysis to brief neural interactions (e.g., Seth, Chorley, & Barnett, 2013). The fact that we observe significant GC effects in our study may thus indicate that the underlying interactions occur with long latencies, perhaps reflecting processes requiring intraregional computations (e.g., Buffalo, Fries, Landman, Liang, & Desimone, 2010).
To summarize, IFJ has a central role in initiating VSTM representations, possibly by establishing feature-specific priority maps in FEF/aIPS. We suggest that the latter regions interact with sensory areas to maintain representations through feature-based attention mechanisms. This interaction culminates in top–down enhancement of the memory trace when it becomes necessary for adaptive behavior.
We would like to thank Andrew Kayser and Ian Cameron for helpful comments on the GC analyses. We also thank two anonymous reviewers for their constructive input on the manuscript. This work was supported by a grant from the University of Oslo to the Center for the Study of Human Cognition.
Reprint requests should be sent to Markus H. Sneve, Center for the Study of Human Cognition, Department of Psychology, P.O. Box 1094 Blindern, 0317 Oslo, Norway, or via e-mail: firstname.lastname@example.org.