Although the content of working memory (WM) can be decoded from the spatial patterns of brain activity in early visual cortex, how populations encode WM representations remains unclear. Here, we address this limitation by using a model-based approach that reconstructs the feature encoded by population activity measured with fMRI. Using this approach, we could successfully reconstruct the locations of memory-guided saccade goals based on the pattern of activity in visual cortex during a memory delay. We could reconstruct the saccade goal even when we dissociated the visual stimulus from the saccade goal using a memory-guided antisaccade procedure. By comparing the spatiotemporal population dynamics, we find that the representations in visual cortex are stable but can also evolve from a representation of a remembered visual stimulus to a prospective goal. Moreover, because the representation of the antisaccade goal cannot be the result of bottom–up visual stimulation, it must be evoked by top–down signals presumably originating from frontal and/or parietal cortex. Indeed, we find that trial-by-trial fluctuations in delay period activity in frontal and parietal cortex correlate with the precision with which our model reconstructed the maintained saccade goal based on the pattern of activity in visual cortex. Therefore, the population dynamics in visual cortex encode WM representations, and these representations can be sculpted by top–down signals from frontal and parietal cortex.
Working memory (WM) enables the brain to maintain behaviorally relevant information over short periods of time while the source of the information is no longer available (Curtis & D'Esposito, 2003; Baddeley & Hitch, 1974). Much of the previous work investigating the neural mechanisms that support WM has focused on the role of the frontal cortex (Srimal & Curtis, 2008; Goldman-Rakic, 1994; Funahashi, Bruce, & Goldman-Rakic, 1989). However, the contents of WM (e.g., orientation, color, spatial frequency, motion) can be decoded from delay period activity in human visual cortex (Albers, Kok, Toni, Dijkerman, & de Lange, 2013; Riggall & Postle, 2012; Ester, Serences, & Awh, 2009; Harrison & Tong, 2009; Serences, Ester, Vogel, & Awh, 2009). Findings such as these support the sensory recruitment model of WM, whereby the same neural mechanisms that encode sensory information are also used to maintain WM representations (D'Esposito & Postle, 2015; Postle, 2006).
Because these decoding studies depend on the patterns of activity across hundreds of voxels, visual cortex appears to encode WM representations within large populations of neurons. Multivariate pattern analysis (MVPA) (Haxby et al., 2001) uses powerful supervised machine learning algorithms to classify distinct brain states (Haynes & Rees, 2006; Norman, Polyn, Detre, & Haxby, 2006; Kamitani & Tong, 2005). Although MVPA has successfully demonstrated that there are different patterns of activation in visual cortex associated with different WM features, it is limited in constraining the number of inferences one can draw about why these patterns are different (Serences & Saproo, 2012; Freeman, Brouwer, Heeger, & Merriam, 2011; Naselaris, Kay, Nishimoto, & Gallant, 2011). In contrast to MVPA, which attempts to predict brain states given a pattern of activity, an inverted encoding model (IEM) uses a set of a priori assumptions that form a linking hypothesis about how these brain states give rise to different patterns (Brouwer & Heeger, 2009; Dumoulin & Wandell, 2008). For example, if one assumes that a voxel's BOLD response during a spatial WM delay period is proportional to the responses of a set of information channels that tile visual space, the mapping between these channels and the BOLD responses can be used to predict the location of the memoranda in the visual field (Sprague, Ester, & Serences, 2014, 2016; Ester, Sprague, & Serences, 2015; Sprague & Serences, 2013). IEMs can reconstruct a full feature space, visual space in the above example, that can then be used for decoding a maintained stimulus feature (e.g., location).
Here, we model the patterns of neural population dynamics in early visual cortex to test several hypotheses. First, we ask if we can reconstruct the locations of memory-guided saccade goals based on the pattern of activity in visual cortex during a memory delay. Second, we ask if we can reconstruct the saccade goal even when we dissociate the location of the visual stimulus from the saccade goal using a memory-guided antisaccade procedure. Third, we test if the population dynamics evolve from a retrospective visual code to a prospective goal code. Fourth, because the representation of the antisaccade goal cannot be the result of bottom–up visual stimulation, it must be evoked by top–down signals, presumably originating from frontal cortex (Moore & Armstrong, 2003). To test this, we ask if trial-by-trial fluctuations in delay period activity in frontal cortex correlate with the precision with which our model reconstructs the maintained saccade goal based on the pattern of activity in visual cortex.
Using procedures approved by New York University, five healthy participants (three women, right-handed, 25–42 years old) gave informed consent and participated in the study. Each volunteer participated in six scanning sessions: one to obtain three high-resolution anatomies, one to measure retinotopy, and four to measure responses during memory-guided prosaccade and antisaccade tasks. Portions of the data were previously published in Saber, Pestilli, and Curtis (2015).
Memory-guided Saccade Tasks
Participants performed two memory-guided saccade tasks (Figure 1). Participants were instructed to remember the location of a briefly presented visual stimulus and immediately plan a saccade to, in the case of memory-guided prosaccades, or away from the stimulus, in the case of memory-guided antisaccades. The stimulus, a high-contrast flickering visual checkerboard (1° diameter, spatial frequency 2 cpd, 8 Hz flicker), appeared at one of eight locations separated by 30° angles at 10° eccentricity, avoiding vertical and horizontal meridians. Each trial began with 1.25 sec of central fixation followed by a brief presentation of the visual stimulus. Participants maintained the planned prosaccade or antisaccade over a long and variable delay period (3–15 sec). Disappearance of the fixation along with a brief auditory tone prompted the memory-guided saccade. Participants held their gaze until the correct location was presented as feedback and corrected any error in their gaze. Participants then returned to fixation for the intertrial interval (10.5–15 sec). Partial trials were also included to help break the dependency of the delay on the target. These trials were aborted after the visual stimulus, signaled by a change in the color of the fixation spot. In four sessions, each with eight 322-sec scans, participants performed 192 prosaccade and 192 antisaccade trials, blocked by condition.
Eye position was recorded in the scanner at 1000 Hz (EyeLink 1000, SR Research, Ontario, Canada). We computed error in degrees of angle of the primary saccade as well as the final error after any quick corrections were made before the target feedback. All trials were inspected for noncompliance. Trials in which participants made an eye movement to the wrong saccade target were rare (0.3% and 0.6% of all pro- and antisaccade trials, respectively). We excluded trials where gaze deviated from fixation greater than 2° during the delay period or immediately following visual target (4.6% and 2.7% of pro- and antisaccade trials, respectively).
All MRI data were acquired on a 3-T Allegra head-only scanner (Siemens, Erlangen, Germany) using a head transmit coil (NM-011) and two surface receive coils (NOVA Medical, Wakefield, MA): (1) a four-channel phased array receive surface coil (NMSC-021) for retinotopic mapping and (2) a four-element phased parallel array (NMSC-011) for memory-guided saccade tasks. We acquired T2*-sensitive echo-planar images (repetition time = 1.5 sec, echo time = 30 msec, flip angle = 75°, 26 slices, 3 × 3 × 3 mm voxels, 192 × 192 mm field of view). Three T1-weighted MPRAGE scans were averaged and used for gray matter segmentation, cortical flattening, registration, and visualization for creating each ROI (see below). Functional scans were corrected for head motion and aligned across sessions, detrended and high-pass filtered with a cutoff frequency of 0.0167 Hz, and converted to percentage signal modulation.
We defined retinotopic visual areas (V1, V2, V3, V3A/B) and the first intraparietal sulcus map (IPS0) using a standard phase encoded approach (Wandell, Dumoulin, & Brewer, 2007). Polar angle components of the retinotopic maps were estimated with a high-contrast flickering wedge that rotated (clockwise or counterclockwise) around central fixation (subtended 90° of polar angle). Radial components were estimated with a high-contrast flickering annulus expanding or contracting from fixation (subtended 4.2° radius). Each scan consisted of 10.5 cycles with a period of 24 sec and lasted 252 sec. Each participant completed six to eight runs of the rotating wedge aperture and four runs of expanding/contracting annulus aperture. We used flattened cortical surface representations of each participant's occipital and parietal cortices to visualize amplitude, coherence, and phase maps. Visual area boundaries were drawn on retinotopic maps based on standard conventions (Wandell et al., 2007; Larsson & Heeger, 2006). We used voxels from these ROIs in our IEM described below.
Estimating Delay Period Activity
We used a voxel-wise generalized linear model (GLM) to estimate the responses of each voxel to the stimulus encoding, memory retention interval (i.e., delay period), and memory-guided saccade intervals of each trial (Srimal & Curtis, 2008). Each epoch was modeled by the convolution of a canonical model of the hemodynamic impulse response function with a square wave (box-car regressor) whose duration was equal to the duration of the corresponding interval, detrended and high-pass filtered. Importantly, we estimated beta coefficients for each of the three trial epochs (stimulus, delay, response) for every trial independently. Modeling every event in each trial guaranteed that the beta coefficient corresponding to each epoch estimated the magnitude of BOLD activity specific to each epoch. As described below, we used the beta coefficients from the stimulus encoding epoch of the prosaccade task for model training and the beta coefficients from the delay periods of the prosaccade and antisaccade tasks for model testing.
Inverted Encoding Model
Based on the spatial selectivity of voxels in each ROI in early visual cortex, we used 56 2-D Gaussian filters with equal FWHMs distributed uniformly around a circle corresponding to a 10° eccentricity (i.e., eccentricity for the visual targets and saccade goals) to construct a set of basis functions. The number and width of these basis functions were chosen in a way to both avoid rank deficiency (caused by too much overlap between channels) and obtain smooth reconstructions. To make sure that using basis functions restricted to the stimulated ring does not bias the reconstructed spatial representation, we also tested a set of basis functions tiling the whole visual field. Because we did not find any meaningful differences in the reconstructed visuospatial maps, here we only report the results from the basis functions restricted to the actual eccentricity used in the experiment.
To evaluate the differences in behavioral performance between pro- and antisaccade tasks, we measured three metrics for each trial: the primary and final saccade accuracies and the RT. For each metric, we shuffled the data, mixing up the prosaccade and antisaccade trials, and calculated a t statistic that compared the shuffled “prosaccade” and “antisaccade” metrics. We did this 200 times to create a null distribution of t statistics that we used to compare to the t statistics computed using the veridically labeled data.
To verify that the IEM can reconstruct the WM content uniformly well across different saccade target locations, we plotted the model's reconstructed locations versus the true target locations. Within each topographic map, we calculated the peak of the reconstructed heat map averaged across all trials for each saccade target location and plotted this against the true target locations. Then, we estimated the slope of lines fit to the these points, where the closer the slope of a line was to one indicated a more precise readout across different locations. Moreover, to evaluate the model performance on a trial-by-trial basis, we defined the readout precision at each trial as , where “error” is the difference between the angular position of the peak in the reconstructed retinotopic map and the actual location in the corresponding trial. We then measured the skewness of the distribution of precisions. The precision varies between 0 (maximum error) and 1 (minimum error). Thus, the higher the trial-by-trial model performance, the more the distribution of precisions was skewed toward 1. To estimate significance, we compared the skewness of the actual data with permuted distribution of skewness values. We shuffled the order of the trials 1000 times and, during each iteration, recomputed the skewness to form the permuted distribution of skewness values. Moreover, we identified correctly decoded trials based on the distance between the IEM readout peak and the actual target location in each trial. A trial is counted as a correctly decoded trial if the decoded location is within 15°, 30°, or 45°, as three defined correctness bounds, of the true location in that trial. We then, through a permutation test, compared the proportion of correctly decoded trials from the original and randomized order of data in each ROI. To do so, we shuffled the trial orders 1000 times, for each participant and each ROI, and reconstructed the visual–spatial maps each time. Then, we ran a Student's t test between the proportion of correctly decoded trials corresponding to the original order and shuffled.
We also tested the spatiotemporal interaction of the reconstructed representation of population activity under different conditions. To measure how reconstruction precision changes from encoding to delay periods at the stimulus and saccade target locations, we fit a 2-D Gaussian to the reconstructed visual–spatial maps and measured the gain, width, and precision of the fit. We then used a two-way ANOVA to test the significance of the interaction between each of these parameters at two time epochs, the encoding and delay periods, and two locations, the stimulus location and saccade goal, which is the same on prosaccade but different on antisaccade trials.
Finally, to identify voxel clusters across the whole brain that significantly interact with model performance in an ROI, we calculated Pearson correlation between the delay period GLM beta coefficients, across trials for every voxel in the brain, and the trial-by-trial success of our model decoding of memorized space. We then used a permutation test to compare the z-transformation of above correlations corresponding to the original and randomized order of trials (corrected for multiple comparison between five ROIs). Next, we compared these clusters between all participants to find brain areas whose delay period activity correlates with the model precision in each of the five ROIs. We identified two clusters of voxels from two participants as the same if (1) the distance between the boundaries of two clusters was not larger than two voxels in the Talaraich TT_N27 brain space and (2) both clusters were in the same brain area according to Freesurfer's cortical labels (aparc.a2009s.annot) from the automatic cortical parcellation.
Performance Equated on Memory-guided Pro- and Antisaccades
Figure 1B depicts the behavioral results for both pro- and antisaccade trials. Overall, 85% and 91% of all memory-guided prosaccade and antisaccade RTs (SRTs), respectively, fell within the normal range of visually guided saccades (i.e., 200–500 msec), and SRT did not significantly differ between the conditions (permutation test, p = .17). The accuracy of memory-guided prosaccades and antisaccades was also indistinguishable (permutation test, p = .56). Moreover, the final eye position after small corrective saccades (Mackey, Devinsky, Doyle, Golfinos, & Curtis, 2016; Mackey, Devinsky, Doyle, Meager, & Curtis, 2016), but before feedback, were not significantly different between the two conditions (permutation test, p = .53). We excluded trials from further analyses when the position of gaze deviated by more than 2° of visual angle away from the central fixation point during delay period. To test if smaller deviations from fixation are informative about the planned saccade location, we calculated the correlation between the angle of gaze deviations during different time epochs of the delay period and the angle of the saccade target. In none of the participants did these correlations reach significance (mean r = −.05, range: r = −.09 to r = .10). Thus, we were able to compare neural differences unconfounded by behavioral differences between the prosaccade and antisaccade conditions and unconfounded by the influences of small gaze deviations.
Modeled Population Activity in Visual Cortex Reconstructs a Representation of WM
We collected BOLD fMRI signal while the participant performed memory-guided pro- and antisaccade tasks (Figure 1A). In both tasks, each trial can be divided into four epochs: central fixation, stimulation (stimulus encoding), delay period, and response. We mainly focused on stimulus encoding and delay period, as these are the intervals that the WM content is being encoded and maintained. We used an IEM (Sprague & Serences, 2013), described in detail in Methods, to reconstruct the spatial representation of neural population activity within each retinotopic map during the encoding and delay periods independently for both prosaccade and antisaccade trials (Figure 1C).
In the memory-guided prosaccade task, the participant is instructed to make a saccade toward the stimulus location after a variable delay period. We hypothesized that the activity patterns of neural population ensembles encode and maintain the stimulus location. Utilizing an IEM, we reconstructed the spatial representation of population activity in the five identified retinotopic areas. To average over all possible stimulus locations, we rotated the estimated channel coefficients, aligning all trials to one location, 45° in the upper right quadrant. Figure 2 shows the reconstruction results for a sample participant (Figure 2A) and for the average of all participants (Figure 2B) during both encoding and delay epochs. The lines on the polar plot depict the relative channel coefficients across the circular space of the model. The heat maps in the background show the reconstructed spatial representation of the neural population activity, which is the linear combination of the information channels each multiplied by its corresponding coefficients. In all retinotopic areas we identified, the IEM model yielded a high-fidelity reconstruction of the stimulus location during both time periods. This suggests that the population activity in these retinotopically organized cortical areas encode and maintain the stimulus location.
Next, we quantified the accuracy and precision of the model in each retinotopic area by estimating the trial-by-trial performance (Figure 3). For each trial, we estimated the accuracy by comparing the peak of the reconstructed visual space within each retinotopic area with the true stimulus location, separately at the encoding and delay epochs. To do this over all stimulus locations, we regressed the decoded locations, averaged over all trials with the same stimulus location, against the true locations. In all five retinotopic areas, we found a tight coupling between reconstructed location and true location as demonstrated by regression slopes that were near 1 (range across ROIs [0.94, 1.05], with largest p = 1.6 × 10−6). The top row of Figure 3 depicts the relation between the decoded location and true location in a sample retinotopic area (V3) during both encoding and delay epochs. We defined the precision of the model as the inverse of the decoded errors across trials, where smaller decoding error approached a precision of 1. The distribution of precision over trials was highly skewed toward 1 for each of the retinotopic areas (range across ROIs [−1.36, −0.51], with largest p = 1.2 × 10−13; note that negative numbers indicate skewness toward 1). The middle row of Figure 3 depicts the distribution of precision in a sample retinotopic area (V3) during the encoding and delay epochs. Moreover, using various thresholds we asked how often did the model correctly decoded the stimulus location within a correctness boundary of 15°, 30°, or 45° of the true location. The bottom row of Figure 3 depicts the percentage of correctly decoded trials at different correctness boundaries. For all retinotopic areas, the percentage of correctly decoded trials was significantly above chance, even at the lowest interval of 15° (largest p = 2 × 10−118 for all correctness boundaries). Overall, our model of the population activity in these retinotopically organized cortical areas accurately and precisely represented the location of the encoded and maintained stimulus location.
Dissociating WM Representations of Past Visual Stimulation from Future Saccade Goals
In the memory-guided prosaccade task, the participant was instructed to plan and make a saccade toward the stimulus location. Therefore, we do not know if the population response encodes the visual target or the saccade goal because they are the same. To dissociate these two possible causes, we used a memory-guided antisaccade task in which the saccade goal is dissociated from the stimulus location (Figure 1A). We used the same IEM as in the prosaccade task to reconstruct the spatial representation of neural populations in each of the retinotopic areas during encoding and delay time epochs. To avoid any circularity in the reconstruction, we used the data corresponding to the encoding phase of the prosaccade task to calculate regression weights for the reconstruction at encoding and delay periods of the antisaccade task.
Figure 4 depicts the reconstructed retinotopic maps for both stimulus encoding and delay epochs for a sample participant and the average over all participants. As with the prosaccade data, the lines on the polar plot depict the relative channel coefficients across the circular space of the model. Here, we aligned all trials such that the location of the visual stimulus was at 45° and the goal of the antisaccade was at 135°. In all retinotopic areas we identified, the IEM model reconstruction yielded two peaks, one at the location of the stimulus and another at the location of the saccade goal. Remarkably, this suggests that the population activity in these retinotopically organized cortical areas encode and maintain locations that were never visually stimulated and, therefore, must be the result of top–down stimulation.
As with the prosaccade data, we quantified the accuracy and precision of the model in each retinotopic area by estimating the trial-by-trial performance (Figure 5). Focusing solely on the delay period epoch, we compared separately both the decoded versus true location of the encoded stimulus and the decoded versus true location of the antisaccade goal. As depicted in the top row of Figure 5 for sample area V3, the true locations of the stimulus and antisaccade target were well predicted by the reconstructed locations as evidenced by regression slopes that were near 1 (range across ROIs [0.93, 1.12], largest p = 1.0 × 10−4). In each of the retinotopic areas, except V2, the distribution of precision over trials was highly skewed toward 1 when decoding the stimulus (range across ROIs [−0.5, −0.24], with largest p = .003) and antisaccade target (range across ROIs [−1.0, −0.4], with largest p = .009) locations. For V2, the average skewness was −0.4 (p = .17). In a sample retinotopic area (V3), the middle row of Figure 5 depicts the distribution of precision of decoding the stimulus and antisaccade target locations based on the pattern of activity in the delay period. Next, we calculated the percentage of correctly decoded stimulus locations and correctly decoded antisaccade target locations within a boundary of 15°, 30°, or 45° of the true location. The bottom row of Figure 5 depicts the percentage of correctly decoded trials at different boundaries. For all retinotopic areas, the percentage of correctly decoded stimulus and antisaccade target locations was significantly above chance even at the lowest interval of 15° (largest p = 8.7 × 10−56 for all boundaries). Overall, our model of the population activity in these retinotopically organized cortical areas accurately and precisely represented the location of both the past stimulus location and future antisaccade goal.
Population Dynamics Track the Evolution of the Shifting Prioritized Location
To perform the antisaccade task, one must first encode the location of the visual stimulus and then compute, plan, and maintain the location of the antisaccade target. As the trial evolves, the prioritized location therefore shifts from the stimulus location to the saccade goal. Consistent with this idea and as can be seen in Figure 4, it appears as if the reconstruction of the visual stimulus is greater during the encoding epoch than the delay period epoch. It also appears as if the reconstruction of the antisaccade target is greater during the delay than the encoding epoch. Building on this observation, we quantified the gain of the stimulus and saccade target locations in the reconstructed visual space during the encoding and delay period epochs. To this end, we fit 2-D Gaussian functions to the peak locations in the reconstructions at both the stimulus and antisaccade target locations. In Figure 6, we plot the average gain at the stimulus and target locations during the encoding and delay epochs for each retinotopic area. As can be seen, there is a strongly significant reconstruction gain interaction between Epoch (encoding/delay) and Location (stimulus/saccade target) in all retinotopic areas (two-way ANOVA for V1, V2, V3, V3AB, and IPS0: smallest F(1, 99) = 597, p = 10−81). This pattern supports the hypothesis that the spatiotemporal pattern of the population activity tracks the prioritized location as it evolves from the visual stimulus to the antisaccade goal.
Delay Period Activity in Frontal and Parietal Cortex Sculpts the Population Activity in Early Retinotopic Areas Improving the Precision of Readout
We demonstrated that the pattern of population activity in visual cortex represents the antisaccade goal. This representation is not the result of bottom–up visual stimulation and, therefore, must be instantiated by top–down feedback signals. To identify the source of these top–down signals, we searched for brain areas whose trial-by-trial fluctuations in the magnitude of delay period activity correlate with the precision with which our model could accurately reconstruct the remembered location in the antisaccade task. We first, for each retinotopic area, created a vector of values that represented the model accuracy, inversely proportional to the error of the reconstructed WM location. Second, using a GLM, we estimated, for all voxels in the brain, the delay period activity for each trial. Then, in an exploratory analysis, we calculated the correlation between decoding accuracy and delay period activity of each voxel. In individual participants, we found clusters of voxels whose delay period activity significantly predicted the decoding accuracy of the antisaccade goal from the pattern of activity in area V3 (Figure 7A). In the sample participant shown, we find significant clusters in dorsolateral pFC, superior precentral sulcus (sPCS), dorsal parietal cortex, and lateral occipital cortex. Despite some individual differences in the exact locations of significant voxel clusters within brain areas, we found consistent overlap in these four brain areas (Figure 7B). The histograms in Figure 7 depict the distribution of voxel correlations within each of the four brain areas, as well as their significance thresholds determined by permutation testing (Methods). In addition, Table 1 contains a list of all clusters of voxels in the brain, containing more than four contiguous voxels with overlap in at least four out of five participants, whose activity is significantly correlated with the decoding accuracy in areas V2 and V3. We focus here on areas V2 and V3 because the correlations were particularly strong and consistent, compared with the correlations with decoding from areas V1, V3AB, and IPS0, which was less consistent. The patterns of correlations identify both attentional and motor planning networks (Corbetta & Shulman, 2002) as potential sources for the modulatory top–down control signals prioritizing the saccade target location in the population activity in early visual cortex.
|Hemisphere .||Label .||x .||y .||z .||Volume (Voxels) .|
|Clusters with Activity Correlating with Model Precision in Area V3|
|Clusters with Activity Correlating with Model Precision in Area V2|
|Hemisphere .||Label .||x .||y .||z .||Volume (Voxels) .|
|Clusters with Activity Correlating with Model Precision in Area V3|
|Clusters with Activity Correlating with Model Precision in Area V2|
Coordinates are in the Talairach TT-N27 space, and labels are based on FreeSurfer aparc.a2009s.annot annotations.
To summarize, we used model-based fMRI to reconstruct the locations of memory-guided saccade goals based on the pattern of activity in visual cortex during a memory delay. We could reconstruct the saccade goal even when we dissociated the location of the visual target from the saccade goal using a memory-guided antisaccade procedure. During these trials, the spatiotemporal population dynamics were informative, as they indicated an evolution from the past remembered visual stimulus to the future prospective goal. Early in the trial, the visual target location was primarily encoded in the population reconstruction. As the trial evolved in time, the population began to primarily encode the location of the saccade goal in the opposite hemifield. Therefore, the representation of the antisaccade goal cannot be the result of bottom–up visual stimulation and must be caused by top–down signals presumably originating from frontal and/or parietal cortex. In support of this hypothesis, we found that trial-by-trial fluctuations in delay period activity in frontal and parietal cortex predicts the precision with which our model reconstructed the maintained saccade goal based on the pattern of activity in visual cortex. Therefore, the population dynamics in visual cortex encode WM representations, and these representations can be sculpted by top–down signals from frontal and parietal cortex (Sreenivasan, Curtis, & D'Esposito, 2014; Curtis & D'Esposito, 2003).
Based on probabilistic population coding theory, we believe that neural populations do not encode sensory features—in our task the location of the saccade goal—but instead encode a probability distribution over sensory features (Ma, Beck, Latham, & Pouget, 2006; Zemel, Dayan, & Pouget, 1998). In this framework, the population characteristics, such as how the encoding of feature space is distributed across the topographic population and how a single neuron is tuned to variation over the feature space, determine the population dynamics. Using similar assumptions about the dynamics of populations of neurons, we built our IEM to decode spatial information encoded in the dynamics of populations of voxels in early visual cortex during WM. Using this model, we could successfully reconstruct a spatial representation of the saccade goal from the activity of neural ensembles in early visual cortex. The success of the model provides a strong linking hypothesis that connects decoding to population-level mechanisms. Specifically, we assumed an underlying neural architecture based on the retinotopic organization of the voxels within visual maps and modeled each voxel's response with a set of basis functions that tiled visual space. The success of spatial WM decoding depended on the fact that the precise structure of our model's basis set matched the representation of the feature space distributed across the neural population. Computational models of microcircuits describe how WM representations are maintained by the balance between recurrent local excitation among similarly tuned neurons and long-range inhibition (Wang et al., 2013; Lundqvist, Compte, & Lansner, 2010; Wang, 1999). Two key predictions from these models are relevant. First, WM representations are encoded in the collective response of populations of neurons whose tuning varies along the stimulus dimension. Second, persistent activity in the subset of neurons tuned to the remembered feature maintains the representation over time. In human visual cortex, evidence exists that support both of these predictions. The current study and other human neuroimaging studies have been successful at reconstructing representations of the population response encoding WM features (Sprague et al., 2014, 2016; Ester et al., 2015; van Bergen, Ma, Pratte, & Jehee, 2015; Ester, Anderson, Serences, & Awh, 2013; Sprague & Serences, 2013). Moreover, neural activity persists in early visual cortex during WM delays in humans, measured with BOLD imaging (Saber et al., 2015; Geng, Ruff, & Driver, 2009), and monkeys, measured with electrophysiology (van Kerkoerle, Self, & Roelfsema, 2017; Super, Spekreijse, & Lamme, 2001). Although the computational models were originally devised to model microcircuits in pFC, these results together open the possibility that similar mechanisms may support WM in early visual cortex.
During the memory-guided antisaccade trials, when we dissociated the location of the visual target from the saccade goal, the spatiotemporal population dynamics first encoded the visual target then encoded the saccade goal. Therefore, the representation of the antisaccade goal in early visual cortex must be the result of top–down signals and not some lingering visual response. Moreover, we recently demonstrated that fMRI activity persists in the specific parts of retinotopic maps that represent the location of memory-guided saccade goals (Saber et al., 2015). Whether these persistent BOLD responses in visual cortex reflect increases in spiking or only affect subthreshold synaptic activity remains unclear, but evidence for both exists (van Kerkoerle et al., 2017; Steinmetz & Moore, 2014; Zaksas & Pasternak, 2006; Super et al., 2001). Feedback signals from FEF increase the response gain of neurons in monkey areas V4 and MT (Merrikhi et al., 2017). Other studies confirm that the effects of top–down feedback signals originating from the monkey FEF are only detectable in extrastriate regions in the presence of bottom–up signals in visual cortex (sometimes referred to as “gating”; Ekstrom, Roelfsema, Arsenault, Bonmassar, & Vanduffel, 2008; Moore & Fallah, 2004; Moore & Armstrong, 2003). Together, these previous results suggest that the effects of feedback on visual cortex causes a multiplicative gain enhancement of visual responses. In our study of human visual cortex using fMRI, we find that top–down signals alone, without concomitant visual stimulation, are sufficient to induce a spatially specific pattern of topographic activity in visual cortex. Overall, our results are consistent with the sensory-motor recruitment model of WM that posits that the same mechanisms that evolved to create internal representations of sensory events are also used to maintain WM representations (Postle, 2006; Pasternak & Greenlee, 2005; Theeuwes, Olivers, & Chizk, 2005; Curtis & D'Esposito, 2003) and add to a growing list of studies demonstrating that WM representations are encoded in the patterns of activity in sensory cortices (Sprague et al., 2014; Albers et al., 2013; Ester et al., 2009, 2013; Lee, Kravitz, & Baker, 2013; Sprague & Serences, 2013; Christophel, Hebart, & Haynes, 2012; Riggall & Postle, 2012; Harrison & Tong, 2009).
Finally, we tested the hypothesis that the source of the top–down signals that prioritize the visual and saccade goal locations originate in the frontal and/or parietal cortex. The trial-by-trial fluctuations in delay period activity in both frontal and parietal cortex correlated with the precision with which our model of visual cortex reconstructed the maintained saccade goal. Notably, delay period activity near the junction of the sPCS and the superior frontal sulcus, the location of the putative human homologue of monkey area FEF, predicted the precision of encoding of the WM representation in visual cortex. Feedback signals during WM originating in FEF cause spatially selective increases in the response gain of neurons and also expand and shift the receptive fields of V4 and MT neurons toward the memorized location (Merrikhi et al., 2017). Previous studies established the importance of the human sPCS for spatial WM. During spatial WM delays, neural activity in the human sPCS persists (Jerde, Merriam, Riggall, Hedges, & Curtis, 2012; Tark & Curtis, 2009; Srimal & Curtis, 2008; Schluppeck, Curtis, Glimcher, & Heeger, 2006; Curtis, Rao, & D'Esposito, 2004). Moreover, TMS (Mackey & Curtis, 2017) and cortical resections (Mackey, Devinsky, Doyle, Meager, et al., 2016) to sPCS impact the precision of memory-guided saccades. Given that the sPCS is necessary for spatial WM, activity persists during WM maintenance, and it is topographically organized (Mackey, Winawer, & Curtis, 2017; Kastner et al., 2007), we propose that it maintains representations of spatial priority (Sprague & Serences, 2013; Jerde et al., 2012). Furthermore, priority maps may be the basis for top–down signals that sculpt the population dynamics in visual cortex in favor of neurons with receptive fields that match the locus of priority, providing a mechanism by which association cortex and sensory cortex interact to support WM.
This study was supported by R01-EY016407 to C. E. C. We thank the NYU Center for Brain Imaging's Pablo Velasco, Keith Sanzenbach, and Valerio Luccio for help with data collection; John Serences and Thomas Sprague for help with the inverted encoding model; and Kartik Sreenivasan for helpful comments on the manuscript.
Reprint requests should be sent to Clayton E. Curtis, New York University, 6 Washington Place 859, New York, NY 10003, or via e-mail: firstname.lastname@example.org.