Primate ventral and dorsal visual pathways both contain visual object representations. Dorsal regions receive more input from magnocellular system while ventral regions receive inputs from both magnocellular and parvocellular systems. Due to potential differences in the spatial sensitivites of manocellular and parvocellular systems, object representations in ventral and dorsal regions may differ in how they represent visual input from different spatial scales. To test this prediction, we asked observers to view blocks of images from six object categories, shown in full spectrum, high spatial frequency (SF), or low SF. We found robust object category decoding in all SF conditions as well as SF decoding in nearly all the early visual, ventral, and dorsal regions examined. Cross-SF decoding further revealed that object category representations in all regions exhibited substantial tolerance across the SF components. No difference between ventral and dorsal regions was found in their preference for the different SF components. Further comparisons revealed that, whereas differences in the SF component separated object category representations in early visual areas, such a separation was much smaller in downstream ventral and dorsal regions. In those regions, variations among the object categories played a more significant role in shaping the visual representational structures. Our findings show that ventral and dorsal regions are similar in how they represent visual input from different spatial scales and argue against a dissociation of these regions based on differential sensitivity to different SFs.
Although visual object representation has traditionally been ascribed to regions of the primate ventral stream, research from the last two decades has unveiled the existence of robust object representations in primate dorsal stream regions (Vaziri-Pashkam & Xu, 2017, in press; Xu, 2017, in press; Kastner, Chen, Jeong, & Mruczek, 2017; Freud, Plaut, & Behrmann, 2016). Dorsal object representations exhibit tolerance to changes in low-level image characteristics such as position, size, and viewpoint (Vaziri-Pashkam & Xu, in press; Konen & Kastner, 2008; Lehky & Sereno, 2007; Sawamura, Georgieva, Vogels, Vanduffel, & Orban, 2005; Sereno & Maunsell, 1998). This suggests that dorsal object representations reflect fairly high levels of visual processing similar to those found in higher ventral object processing regions. Meanwhile, attention and task appear to have a stronger role in modulating object representations in dorsal than ventral regions (Bracci, Daniels, & Op de Beeck, 2017; Vaziri-Pashkam & Xu, 2017). These findings, together with a review of past and recent literature, suggest that there exist two types of visual representations in the primate brain. Specifically, although visual representations formed in the ventral visual cortex are largely invariant to the context of visual processing and provide a detailed and stable analysis of the visual world at multiple distinctive levels (Kravitz, Saleem, Baker, Ungerleider, & Mishkin, 2013), those formed in the dorsal regions are more adaptive to the context of visual processing by carrying salient and task-relevant information and protecting such information from decay and distraction (Xu, in press).
Here, we examined if object representations in ventral and dorsal regions are similar in how they represent visual input from different spatial scales (i.e., different spatial frequency (SF) components of the object images). There are reasons to speculate that different SF components would differentially impact ventral and dorsal object representations. These regions differ in the type of input they receive from the magnocellular and parvocellular systems with dorsal regions receiving more input from the magnocellular and ventral regions receiving input from both the magnocellular and parvocellular systems (Ferrera, Nealy, & Maunsell, 1994; Merigan & Maunsell, 1993; Maunsell, Nealey, & DePreist, 1990; Livingstone & Hubel, 1988; DeYoe & Van Essen, 1985; Shipp & Zeki, 1985). It has been suggested that the two systems may vary in their spatial frequency sensitivity with neurons in the magnocellular system having a lower spatial resolution and larger receptive field size, and as a result lower peak spatial frequency sensitivity than those in the parvocellular system (Merigan, Katz, & Maunsell, 1991; Derrington & Lennie, 1984; Kaplan & Shapley, 1982). Although it has been argued that these differences at the level of lateral geniculate nucleus are small and could be caused by the differences in receptive field eccentricity (e.g., Skottun, 2015; Levitt, Schumer, Sherman, Spear, & Movshon, 2001; Blakemore & Vital-Durand, 1986), there seems to be clear differences between the SF responses of the magnocellular- and parvocellular- driven parts of early visual cortex (Tootell, Silverman, Hamilton, Switkes, & De Valois, 1988) that later feed into the downstream dorsal and ventral regions. In line with the dissociation of dorsal and ventral regions in their SF sensitivity, fMRI studies of visual processing have shown that dorsal regions such as V3A respond more highly to low-SF images (Tootell & Nasr, 2017; Henriksson, Nurminen, Hyvärinen, & Vanni, 2008; Singh, Smith, & Greenlee, 2000), whereas ventral regions such as V4 respond more highly to high-SF images (Tootell & Nasr, 2017) or show similar responses to low and high-SF images (Henriksson et al., 2008; Singh et al., 2000). These studies, however, have mostly used simple gratings as their stimuli. It is unclear whether these results would generalize to complex stimuli such as real-world objects and object categories. Moreover, these studies have only examined lower dorsal regions, such as V3a. SF sensitivity of higher dorsal regions has not been thoroughly investigated.
Studies that have used more natural images to investigate SF sensitivity in the human brain have focused primarily on ventral regions with most studies reporting higher responses to high-SF image components in various regions in the ventral pathway (Fintzi & Mahon, 2013; Iidaka, Yamashita, Kashikura, & Yonekura, 2004). It has been further suggested that within occipitotemporal cortex different subregions process high- and low-SF image components (Rotshtein, Vuilleumier, Winston, Driver, & Dolan, 2007). In the parahippocampal cortex, it has been shown that category selectivity is more pronounced for high-SF images (Rajimehr, Devaney, Bilenko, Young, & Tootell, 2011) and that sensitivity to spatial information is modulated by the SF components of the image (Zeidman, Mullally, Schwarzkopf, & Maguire, 2012) although evidence for higher responses to low SF compared to high SF scenes has also been reported in the right parahippocampal cortex (Peyrin, Baciu, Segebarth, & Marendaz, 2004).
Less attention has been paid to the effect of SF on object responses in the dorsal stream. Mahon, Kumar, and Almeida (2013) examined whether or not there was a preferential representation for tools over animals in intraparietal sulcus (IPS) and how such a response would be modulated by SF. They reported that the inferior part of IPS showed higher activation for high SF tools than high SF animals whereas the superior part of IPS showed higher activation for low SF tools than low SF animals. Because the graspable tools used in the study tended to have an elongated shape whereas the animals used tended to have a rounded shape and because the high-SF images contained more visual details than the low-SF images, the differential activation profile observed here could reflect a difference in shape processing rather than a difference in SF sensitivity. Importantly, the critical comparison of high SF animals and tools versus low SF animals and tools was not performed. It is thus unclear whether dorsal regions show a preference in response amplitude to the high or low-SF component of an object image.
All the studies mentioned above have used average fMRI response amplitude to measure SF sensitivity of individual regions. Despite having high power in detecting overall response differences, amplitude measures do not tell us whether the underlying visual representations differ for images appearing in different SF components or whether they merely reflect a change in the overall activation without a change in the nature of the representations. In this study, using fMRI multivoxel pattern analysis (MVPA), we addressed this question by examining object representations formed from different SF components of object images in large swaths of the posterior cortex including early visual, ventral, and dorsal regions.
In early visual and ventral regions, we included topographic areas V1 to V4 and areas involved in visual object processing in lateral occipitotemporal (LOT) and ventral occipitotemporal (VOT) regions. LOT and VOT loosely corresponded to the location of the lateral occipital and posterior fusiform areas (Grill-Spector et al., 1998; Malach et al., 1995) but extended further into the temporal cortex in our effort to include as many object-selective voxels as possible in the ventral region. Responses in these regions have been shown to correlate with successful visual object detection and identification (e.g., Williams, Dang, & Kanwisher, 2007; Grill-Spector, Kushnir, Hendler, & Malach, 2000), and lesions to these areas have been linked to visual object agnosia (Farah, 2004; Goodale, Milner, Jakobson, & Carey, 1991). In dorsal regions, we included regions previously shown to exhibit robust visual responses along the IPS. Because existing studies have demarcated regions in the dorsal pathway using either topographic mapping or functional localizers with perception and visual working memory (VWM) tasks and because the two sets of regions defined do not show a complete overlap, we included both for completeness. Specifically, we included topographic regions V3A, V3B, and IPS0–IPS4 (Silver & Kastner, 2009; Swisher, Halko, Merabet, McMains, & Somers, 2007; Sereno et al., 1995) and two functionally defined object-selective regions, with one located at the inferior part and one at the superior part of IPS (henceforth referred to as inferior IPS and superior IPS, respectively, for simplicity). Inferior IPS has been shown to be involved in visual object selection and individuation and overlaps to a great extent with topographic regions V3A/V3B/IPS0, whereas superior IPS is associated with visual representation and VWM storage and overlaps with IPS1/IPS2 (Bettencourt & Xu, 2016a, 2016b; Jeong & Xu, 2016; Xu & Jeong, 2015; Xu & Chun, 2006, 2009; see also Todd & Marois, 2004).
We found robust object category decoding in different SF components in all the regions examined and decoding for SF in most regions examined. Moreover, similar object representations were formed from the different SF components of object images, as all the regions examined exhibited substantial tolerance across the SF components. Further analysis revealed that, whereas the SF component was a prominent feature in determining the representational structure of early visual areas, object categories, and not the SF component, appeared to play a more prominent role in shaping the representational structures of ventral and dorsal regions. Overall, ventral and dorsal regions appeared to be similar in how they represent visual input from different spatial scales, and there was no dissociation of these regions based on differential sensitivity to the SF components of an image.
Ten (five women) right-handed healthy participants with normal or corrected-to-normal visual acuity and aged between 18 and 40 participated in the experiment. All participants gave their informed consent before the experiment and received payment for their participation. The experiment was approved by the Committee on the Use of Human Subjects at Harvard University.
Experimental Design and Procedures
In this experiment, we used grayscale images from six object categories (faces, bodies, houses, elephants, cars, and chairs) and modified them to occupy roughly the same area on the screen (Figure 1A). These categories were chosen as they cover a range of natural and manmade object categories; include animate, inanimate, large, and small objects; and include some of the typical categories used in previous investigations of object category representations in ventral regions (e.g., Kriegeskorte et al., 2008; Haxby et al., 2001). For each object category, we selected 10 different exemplar objects that varied in identity, pose, size, and viewing angle to minimize the low-level similarities among them. The width of the objects varied between 2.5° and 7.4°, and the height varied between 2.0° and 7.4°. All images were placed on a dark gray square (subtended 7.8° × 7.8°) and displayed on a light gray background. These images were part of a larger set of images used in a previous study (Vaziri-Pashkam & Xu, 2017).
Stimuli were presented in three conditions: Full-SF, High-SF, and Low-SF. In the Full-SF condition, the full spectrum images were shown without modification of the SF content. In the High-SF condition, images were high-pass filtered using an FIR filter with a cutoff frequency of 4.40 cycles per degree (Figure 1B). In the Low-SF condition, the images were low-pass filtered using an FIR filter with a cutoff frequency of 0.62 cycles per degree (Figure 1B). The DC component was restored after filtering so that the image backgrounds were equal in luminance. We used a block design and presented 8-sec image blocks separated by 8-sec fixation periods. An additional 12-sec fixation period was also shown at the beginning of each experimental run. Each image block contained a random sequential presentation of 10 exemplars from the same object category. Each image was presented for 200 msec followed by a 600-msec blank interval between the images (Figure 1C). Participants were asked to fixate at a central red dot (0.43° in diameter) throughout the experiment and detect a 1-back repetition of the exact same image by pressing a key on an MR-compatible button box. Two image repetitions occurred randomly in each image block. Each run contained 18 blocks (6 categories × 3 image formats), with the order of the six object categories and the three image formats counterbalanced across runs. Each participant completed a single-scan session containing 18 experimental runs, with each lasting 5 min. Eye movements were monitored using SR Research EyeLink 1000 to ensure proper fixation.
All the localizer experiments conducted here used previously established protocols, and the details of these protocols are reproduced here for the reader's convenience.
Topographic visual regions.
These regions were mapped with flashing colored checkerboards using standard techniques (Swisher et al., 2007; Sereno et al., 1995) with parameters optimized following Swisher et al. (2007) to reveal maps in parietal cortex. Specifically, a polar angle wedge with an arc of 72° swept across the entire screen (23.4° × 17.5° of visual angle). The wedge had a sweep period of 55.467 sec, flashed at 4 Hz, and swept for 12 cycles in each run (for more details, see Swisher et al., 2007). Each run lasted 11 min 56 sec. The task varied slightly across participants. All participants were asked to detect a dimming in the visual display. For two of the participants, the dimming occurred only at fixation, and for the rest, it occurred within the polar angle wedge, commiserate with the various methodologies used in the literature (Bressler & Silver, 2010; Swisher et al., 2007). No differences were observed in the maps obtained through these two methods.
VOT and LOT.
To identify LOT and VOT ROIs, following Kourtzi and Kanwisher (2000), participants viewed blocks of face, scene, object, and scrambled object images (all subtended approximately 12.0° × 12.0°). The images were grayscaled photographs of male and female faces, common objects (e.g., cars, tools, and chairs), indoor and outdoor scenes, and phase-scrambled versions of the common objects. Participants monitored a slight spatial jitter that occurred randomly once every 10 images. Each run contained four blocks of each of scenes, faces, objects, and phase-scrambled objects. Each block lasted 16 sec and contained 20 unique images, with each appearing for 750 msec and followed by a 50-msec blank display. Besides the stimulus blocks, 8-sec fixation blocks were included at the beginning, middle, and end of each run. Each participant was tested with two or three runs, each lasting 4 min 40 sec.
To identify the superior IPS ROI previously shown to be involved in VWM storage (Xu & Chun, 2006; Todd & Marois, 2004), we followed the procedures developed by Xu and Chun (2006) and implemented by Xu and Jeong (2015). In an event-related object VWM experiment, participants viewed, in the sample display, a brief presentation of one to four everyday objects and, after a short delay, judged whether a new probe object in the test display matched the category of the object shown in the same position as in the sample display. A match occurred in 50% of the trials. Grayscaled photographs of objects from four categories (shoes, bikes, guitars, and couches) were used. Objects could appear above, below, to the left, or to the right of the central fixation. Object locations were marked by white rectangular placeholders that were always visible during the trial. The placeholders subtended 4.5° × 3.6° and were 4.0° away from the fixation (center to center). The entire display subtended 12.5° × 11.8°. Each trial lasted 6 sec and contained the following: fixation (1000 msec), sample display (200 msec), delay (1000 msec), test display/response (2500 msec), and feedback (1300 msec). With a counterbalanced trial history design (Xu & Chun, 2006; Todd & Marois, 2004), each run contained 15 trials for each set size and 15 fixation trials in which only the fixation dot was present for 6 sec. Two filler trials, which were excluded from the analysis, were added at the beginning and end of each run, respectively, for practice and trial history balancing purposes. Participants were tested with two runs, each lasting 8 min.
Following the procedure developed by Xu and Chun (2006) and implemented by Xu and Jeong (2015), participants viewed blocks of objects and noise images. The object images were similar to the images used in the superior IPS localizer, except that, in all trials, four images were presented on the display. The noise images were generated by phase-scrambling the entire object images. Each block lasted 16 sec and contained 20 images, each appearing for 500 msec followed by a 300-msec blank display. Participants were asked to detect the direction of a slight spatial jitter (either horizontal or vertical), which occurred randomly once in every 10 images. Each run contained eight object blocks and eight noise blocks. Each participant was tested with two or three runs, each lasting 4 min 40 sec.
MRI data from the first six participants were collected using a Siemens MAGNETOM Trio, A Tim System 3T scanner, with a 32-channel receiver array head coil. Data from the last four participants were collected after the scanner was upgraded to a Prisma system. All scanning took place at the Harvard University Center for Brain Science imaging facility. Participants lay on their back inside the MRI scanner and viewed the back-projected LCD with a mirror mounted inside the head coil. The display had a refresh rate of 60 Hz and a spatial resolution of 1024 × 768. An Apple MacBook Pro laptop was used to present the stimuli and collect the motor responses. For topographic mapping, the stimuli were presented using VisionEgg (Straw, 2008). All other stimuli were presented with MATLAB running Psychtoolbox extensions (Brainard, 1997).
Each participant completed three MRI scan sessions to obtain data for the high-resolution anatomical scans, the topographic maps, the functional ROIs, and the experimental scans. A high-resolution (1.0 × 1.0 × 1.3 mm) structural image was obtained for surface reconstruction. For all scans, gradient-echo pulse sequences were used to acquire the fMRI data. For the experimental scans, 33 axial slices parallel to the AC–PC line (3 mm thick, 3 × 3 mm in-plane resolution with 20% skip) were used to cover the whole brain (repetition time [TR] = 2 sec, echo time [TE] = 29 msec, flip angle = 90°, matrix = 64 × 64). For the LOT/VOT and inferior IPS localizer scans, 30–31 axial slices parallel to the AC–PC line (3 mm thick, 3 × 3 mm in-plane resolution with no skip) were used to cover occipital, temporal, and parts of parietal and frontal lobes (TR = 2 sec, TE = 30 msec, flip angle = 90°, matrix = 72 × 72). For the superior IPS localizer scans, 24 axial slices parallel to the AC–PC line (5 mm thick, 3 × 3 mm in-plane resolution with no skip) were used to cover most of the brain with priority given to parietal and occipital cortices (TR = 1.5 sec, TE = 29 msec, flip angle = 90°, matrix = 72 × 72). For topographic mapping, 42 slices (3 mm thick, 3.125 × 3.125 mm in-plane resolution with no skip) just off parallel to the AC–PC line were collected to cover the whole brain (TR = 2.6 sec, TE = 30 msec, flip angle = 90°, matrix = 64 × 64). Different slice prescriptions were used here for the different localizers to be consistent with the parameters used in our previous studies. Because the localizer data were projected into the volume view and then onto individual participants' flattened cortical surface, the exact slice prescriptions used had minimal impact on the final results.
fMRI data were analyzed using FreeSurfer (surfer.nmr.mgh.harvard.edu, Dale, Fischl, & Sereno, 1999) and in-house MATLAB codes. LibSVM software (Chang & Lin, 2011) was used for the MVPA support vector machine (SVM) analysis. fMRI data preprocessing included 3-D motion correction, slice timing correction, and linear and quadratic trend removal. No smoothing was applied to the main experimental data. All the analysis for the main experiment was performed in the volume. The ROIs were selected on the surface and then projected back to the volume for the MVPA analysis.
Following the procedures described in Swisher et al. (2007) and by examining phase reversals in the polar angle maps, we were able to identify topographic areas within occipital and parietal cortices including V1, V2, V3, V4, V3A, V3B, IPS0, IPS1, IPS2, IPS3, and IPS4 in each participant (Figure 2A). Activations from IPS3 and IPS4 were, in general, less robust than those from other IPS regions. Consequently, the localization of these two IPS ROIs was less reliable. Nonetheless, we decided to include these two ROIs here to have more extensive coverage of regions in the posterior parietal cortex.
LOT and VOT.
These two ROIs (Figure 2C and D) were defined as a cluster of contiguous voxels in the lateral and ventral occipital cortex, respectively, that responded more (p < .001 uncorrected) to the original than to the scrambled object images. LOT and VOT loosely correspond to the location of the lateral occipital and posterior fusiform areas (Kourtzi & Kanwisher, 2000; Grill-Spector et al., 1998; Malach et al., 1995) but extend further into the temporal cortex in an effort to include as many object-selective voxels as possible in occipital-temporal cortex. For two participants for LOT and for one participant for VOT, the threshold of p < .001 resulted in too few voxels so the threshold was relaxed to p < .01 to have at least 100 voxels across the two hemispheres.
To identify this ROI (Figure 2B), fMRI data from the superior IPS localizer were analyzed using a linear regression analysis to determine voxels whose responses tracked the participant's behavioral VWM capacity estimated using Cowan's K formula (Cowan, 2001). Superior IPS was defined as a region in the parietal cortex that showed significant activation (Todd & Marois, 2004). Initially, we defined superior IPS with a threshold of p < .001 uncorrected. However, for some participants, this produced an ROI that contained too few (less than 20) voxels for MVPA decoding. Therefore, we selected p < .001 (uncorrected) in four participants and relaxed the threshold to 0.01, 0.05, or 0.1 for the other participants to obtain a reasonably large superior IPS with at least 100 voxels across the two hemispheres. This produced an ROI with a range of 150–423 voxels and an average of 234 voxels across all the participants.
This ROI (Figure 2B) was defined as a cluster of contiguous voxels in the intraparietal sulcus that responded more (p < .001 uncorrected) to the intact than to the scrambled object images and that did not overlap with the superior IPS and LOT region. For one participant, the threshold was relaxed to 0.1 to obtain a region with a reasonable number of voxels across hemispheres (the final size of inferior IPS was 90 voxels for this subject).
Within-SF object category decoding.
To determine whether object category information is present in each ROI, we performed MVPA using an SVM classifier. We calculated the decoding accuracy for object categories in each ROI and each SF condition. To do so, we first performed a general linear model analysis with 18 factors (3 SF conditions × 6 object categories) separately for each run to obtain the beta values for the 18 conditions in each voxel of the brain and each run. We then extracted the fMRI response patterns for each ROI by only including the voxels in that ROI. To remove amplitude differences between conditions and ROIs, we normalized the beta values across all voxels in each ROI and each condition using z-score transformation. This resulted in the amplitude of the voxels for each ROI and each condition to have a mean of zero and a standard deviation of 1. Following Kamitani and Tong (2005), we conducted an SVM analysis with a leave-one-out cross-validation procedure to decode each pair of object categories separately for each of the three SF conditions. As pattern decoding depends on the total number of voxels in an ROI to equate the number of voxels in the different ROIs to facilitate comparisons across ROIs, the 75 most informative voxels were selected from each ROI using a t-test analysis (Mitchell et al., 2004). Specifically, during each SVM training and testing iteration, the 75 voxels with the lowest p values for discriminating between the two conditions of interest were selected from the training data. An SVM was then trained and tested only on these voxels. We calculated the decoding performance for discriminating pairs of object categories (15 total pairs) and pooled these results to determine the average decoding accuracy in each ROI for a given SF condition.
In this analysis and all subsequent classification analysis, when results from all participants were combined to perform group-level statistical analyses, all p values reported were corrected for multiple comparisons using Benjamini–Hochberg procedure for false discovery rate controlled at q < 0.05 (Benjamini & Hochberg, 1995). In the analysis of the 15 ROIs, the correction was applied to 15 comparisons, and in the analysis of the three representative regions, the correction was applied to three comparisons.
Cross-SF object category decoding.
To determine whether or not object category representations in each ROI were similar (i.e., exhibited tolerance) between the high- and low-SF components of an object image, we tested whether the object category representations formed in each ROI were specific to a particular SF component or could be generalized to a different SF component. To do so, we employed the same decoding procedure as described in the previous section and first trained the classifier to discriminate between two object categories in either the low- or high-SF condition. We then tested the classifier's performance in discriminating the same two object categories in the opposite SF condition (i.e., trained in the high-SF condition and tested in the low-SF condition, or the reverse). We repeated the same procedure for every pair of objects (15 pairs) and pooled the results to determine the average cross-SF object category decoding accuracy for each ROI.
We also compared cross-SF object category decoding accuracy with within-SF object category decoding accuracy in which training and testing were done within the same SF condition (i.e., within the low SF and within the high SF first and then average the two together).
Decoding the SF component.
In addition to object category representation, we also directly examined whether the difference between the low- and high-SF components of an object image was represented in each ROI. This was done by measuring the decoding accuracy for discriminating the low- and high-SF image components of each object category and then averaging the results across object categories.
Comparing object category and SF component representations.
To directly visualize the object category representation similarity in a given ROI and how it might be influenced by the two SF components from the pairwise category decoding accuracies within and across SF conditions, we constructed a representational similarity matrix for the six object categories across the low- and high-SF conditions for each participant. The resulting matrix was then averaged across all the participants. We transformed this averaged similarity matrix to a distance matrix by subtracting 0.5 from all the values to obtain a matrix with a diagonal of zero and off-diagonals greater than zero. If the value of a cell in the matrix was below zero after subtracting 0.5, it was replaced with zero. This was done to provide a valid distance input matrix for the multidimensional scaling (MDS) analysis (Shepard, 1980). We then performed dimensionality reduction using MDS and projected the first two dimensions of the representation similarity matrix onto a 2-D space with the distance between the categories denoting their relative similarities to each other. To calculate the percent variance explained by the two dimensions, the distances between the points on the two dimensions of the MDS was calculated, and then the r2 between these 2-D distances and the original representational similarity matrix was computed. To quantify the observations made with MDS, we also directly compared the decoding of an object category from the other object categories when they shared the same SF component (within-SF category decoding) to the decoding of an object category from the other object categories having the opposite SF component (between-SF category decoding).
To compare representations in early visual and higher ventral and dorsal regions and to streamline our analyses, we performed the above analysis on three representative regions: V1, VOT, and superior IPS. We chose V1 as a representative early visual area as it is the first cortical region where visual information is processed. We chose VOT as a representative ventral region as it is one of the high-level regions in the ventral visual processing pathway. Lastly, we chose superior IPS as a representative dorsal region as it has been shown to be capable of representing a diverse array of visual information in perception and VWM tasks. This selection allows us to make a fair comparison between higher regions in ventral and dorsal pathways and contrasting them with early visual area V1.
Eye Movement Data Analysis
To analyze the eye position data, we first removed saccades and blinks. We then corrected for eye movement measurement drifts across experimental runs by subtracting the median deviation in eye position during the first fixation block from that during the stimulus presentation blocks in each run. The median eye position deviation for each stimulus condition was then calculated across runs in each observer and then analyzed at the group level.
To examine how visual object representations in early visual, ventral, and dorsal regions of the human brain are affected by the SF components of an object image, in this study, we examined both univariate (i.e., average response amplitudes) and multivariate (i.e., MVPA classifications) object category responses while manipulating the SF component of the object images. We presented the original full-spectrum (Full-SF) images and modified images that only contained the high-SF (High-SF, >4.40 cpd) or low-SF (Low-SF, <0.62 cpd) component of the original image (Figure 1B). We used grayscale images from six object categories (faces, bodies, houses, elephants, cars, and chairs). These categories were chosen as they covered a range of natural and manmade object categories and are some of the typical categories used in previous investigations of object category representations in ventral regions (e.g., Kriegeskorte et al., 2008; Haxby et al., 2001).
To examine how different SF component of an object image would affect object responses, we compared the average fMRI responses in our ROIs across all object categories among the three SF conditions (Figure 3). In all early visual (V1–V3) and ventral regions (V4, LOT, and VOT), although the average response between the Full-SF and High-SF conditions did not differ (ts < 2.25, ps > .22), they were both higher than that of the Low-SF condition (ts > 2.84, ps < .05; all pairwise t tests reported were corrected for multiple comparisons for the number of brain regions examined using the Benjamini–Hochberg procedure with false discovery rate set at q < 0.05; this applies to all subsequent pairwise t tests involving multiple brain regions). In dorsal regions including V3A, V3B, IPS0–IPS4, inferior IPS, and superior IPS, the Full-SF condition did not differ from either the High-SF or Low-SF condition (ts < 2.46, ps > .07). The High-SF condition did show a higher response than the Low-SF condition in V3B, IPS1, and inferior IPS (ts > 2.6, ps < .05). This difference was marginally significant in IPS2 and superior IPS (ts = 2.4, ps = .056) but did not reach significance in V3A, IPS0, IPS3, and IPS4 (ts < 1.4, ps > .24).
Overall, in early visual and ventral regions, the high-SF component of an object image elicited significantly higher response amplitudes than the low-SF component. This is consistent with prior findings of Fintzi and Mahon (2013). In dorsal regions, the high-SF component also elicited higher responses than the low-SF component in a number of regions. The dorsal regions certainly did not show a preference for the low-SF component of an object image as would be predicted by the direct LGN input they receive.
To examine whether or not responses to the two SF components would vary across object categories, in each ROI we conducted a repeated-measures ANOVA with SF component (High-SF and Low-SF) and Object category as factors. Results showed a significant effect of Object category in almost all the ROIs (Fs > 2.78, ps < .03), except for VOT and IPS2 (Fs < 1.41, ps > .25). To compare the response amplitudes between pairs of categories, for brevity, we focused our analysis on three representative regions: V1, VOT, and superior IPS (see Methods for why these three regions were chosen). In V1, the category effect was driven by a higher response to houses than to all the other object categories (ts > 4.2, ps < .01, corrected for all possible pairwise category comparisons; this applies to all subsequent comparisons between pairs of categories) and no difference between any other pairs of categories (ts < 1.76, p > .1). In VOT and superior IPS, there was no difference between the univariate responses for pairs of categories (ts < 3.03, p > .1). Thus, although a significant category effect existed in all the regions examined, the effect was small and the nature of the effect varied across brain regions. Importantly, there was no interaction between object category and SF component in any of the ROIs examined (Fs < 2.98, ps > .13). This shows that the differential fMRI response amplitudes to the two SF components observed were not driven by any particular object category or categories used in the experiment.
MPVA Decoding of Objects Categories
We used MVPA to examine how object category decoding would be affected by the different SF components of an image. To account for response amplitude differences among the different SF conditions, we used a z-normalization procedure (see Methods) to equate the average responses across ROIs before conducting MVPA.
All ROIs examined in early visual, ventral, and dorsal regions showed above chance category decoding accuracy for all three SF conditions: t(9) = 2.04, p = .07 for IPS4 for the Low-SF condition and ts > 2.51, ps < .035 for everything else (see Figure 4A). Pairwise comparisons revealed that decoding accuracy was higher for the Full-SF condition than for either the High-SF or Low-SF condition in most of the regions (ts > 2.75, ps < .05), except for IPS2, IPS4, and superior IPS that did not show either of these differences and IPS0–IPS1 that did not show a difference between the Full-SF and High-SF conditions (ts < 2.2, ps > .08). Critically, decoding accuracy did not differ between the High-SF and Low-SF conditions in any of the early visual, ventral, or dorsal regions examined (ts < 1.66, ps > .52).
These results show that significant category representations exist in early visual, ventral, and dorsal regions regardless of the SF component of an image. In early visual, ventral, and lower dorsal regions, the Full-SF condition resulted in better category representation than either the Low-SF or High-SF condition. However, this difference was not found in higher dorsal regions. Importantly, a difference in object representation strength was not found between the Low-SF and High-SF conditions in any of the regions examined. This indicated that the robustness of object category representation was not affected by the different SF component of an image in early visual, ventral, or dorsal regions; in addition, there was no systematic difference between the ventral and dorsal regions in their patterns of object responses to the high- and low-SF components of object images (Figure 4A).
MVPA Cross-SF Decoding of Object Categories
To examine whether or not object category representations formed from the low-SF image components were the same as those from the high-SF image components, we performed cross-SF object category decoding. Because the Full-SF condition shared SF components with both the High-SF and Low-SF conditions, potentially yielding an uninformative cross-SF decoding, we removed that condition from this analysis. We trained a classifier with the High-SF condition and tested it with the Low-SF condition and vice versa.
We found above-chance cross-SF object category decoding in almost all the ROIs examined (ts > 2.6, ps < .03) with the exception of IPS4, which showed a nonsignificant trend, t(9) = 1.74, p = .12. In other words, in early visual, ventral, and dorsal regions, there was a high degree of similarity between object category representations from the high- and low-SF components of object images to yield above chance cross-SF decoding (Figure 4B). To quantify the amount of similarity, we next compared cross-SF decoding accuracy with that of within-SF decoding (in which classifier training and testing were carried out within the same SF and then averaged across the High-SF and Low-SF conditions). Cross-SF decoding showed a significant drop in accuracy compared with within-SF decoding in V4, LOT, and VOT as well as V3a and inferior IPS (ts > 2.99, ps < .05). This difference, however, was not significant in V1–V3, V3b, IPS0–IPS4, and superior IPS (ts < 2.26, ps > .12). Thus, although object category representations formed from the low- and high-SF components of the object images did differ somewhat from each other in ventral regions and lower dorsal regions, this difference was not found in early visual and higher dorsal regions (Figure 4B).
MVPA Decoding of the Different SF Components
Here, we examined whether the difference between the different SF components of an object image was directly represented in a given ROI. We measured the decoding accuracy for discriminating the high- and low-SF image components of each object category and averaged the results across object categories to calculate the decoding accuracy of the SF component in each ROI (Figure 4C). SF component decoding accuracy was significantly above chance for most of the ROIs examined (ts > 2.25, ps < .05), except for IPS2–IPS4 (ts < 1.08, ps > .35). These results demonstrated that, despite the presence of robust object representations that were generalizable across SF components, differences between the different SF components were also represented in early visual as well as ventral and some dorsal regions.
Comparing Object Category and SF Component Representations
To understand how object categories and SF components jointly determine the representational structure of a brain region, in this analysis, we directly contrasted the strength of object category representations within and between different SF components. We first directly visualized the similarity between object category representations in a given brain region across the SF components. To do so, from the pairwise category decoding accuracies within and between SF conditions, we constructed a representational similarity matrix for the six object categories across the Low- and High-SF conditions. We then performed dimensionality reduction using an MDS analysis (Shepard, 1980) and projected the first two dimensions of the representational similarity matrix onto a 2-D space with the distance between the categories denoting their relative similarities to each other. To compare representations in early visual and higher ventral and dorsal regions, we performed this analysis on three representative regions: V1, VOT, and superior IPS (see Methods for why these three regions were chosen). The two dimensions of MDS explained 89%, 80%, and 83% of the variance in the three regions, respectively. In V1, a difference in SF component separated the object categories into two distinctive clusters, such that an object category was closer to the other categories sharing the same SF component than to the same category shown in another SF component. This clear separation by SF component, however, was not seen in either VOT or superior IPS. In those two brain regions, object categories largely overlapped across the two SF components with only a small shift between the categories from the two SF components (Figure 5A).
To quantify this observation, we compared the decoding of an object category from the other categories sharing the same SF component (within-SF category decoding) to the decoding of an object category from the other categories from the opposite SF component (between-SF category decoding). Of all the ROIs examined, most showed higher between- than within-SF category decoding (ts > 2.93, ps < .05), except for IPS3 and IPS4 that showed a marginally significant difference (ts < 2.24, ps > .055) and IPS2 that showed no difference, t(9) = 1.6, p = .14 (see Figure 5B). Focusing on the three representative brain regions, the shift between the object category representation in the two SF components (i.e., the difference of between- and within-SF category decoding) was greater in V1 than in either VOT, t(9) = 3.62, p < .01, or superior IPS, t(9) = 7.46, p < .001, with no difference between the latter two, t(9) = 1.68, p = .13. This indicated that the SF component played a more prominent role in shaping the representational structure of V1 than either VOT or superior IPS, confirming the MDS results shown in Figure 5A. Moreover, these results showed that differences in SF components affected the representational structure in a similar way in both VOT, a higher ventral region, and superior IPS, a higher dorsal region.
Hit rates were overall high for the 1-back repetition detection task performed on the images of the six categories (86.7 ± 0.14%). There was a significant effect of SF, F(2, 18) = 4.4, p < .05, a marginally significant effect of category, F(5, 45) = 2.19, p = .07, but no interaction between the two, F(5, 45) = 0.41, p = .9. The effect of SF was driven by a higher hit rate in the Full-SF condition than the High-SF condition, t(9) = 2.34, p < .05, and a marginally lower hit rate in the High-SF condition than the Low-SF condition, t(9) = 2.03, p = .07, with no difference between the Full-SF and Low-SF conditions, t(9) = 0.57, p = .58. Post hoc tests showed no difference in hit rates between pairs of categories, t(9) < 2.6, p > .26, corrected. Our failure to find higher fMRI object category decoding in the Low-SF condition than in the High-SF condition in the dorsal regions thus cannot be due to the poorer encoding of the Low-SF condition than the High-SF condition; if anything, the behavioral performance showed the opposite trend.
To ensure proper fixation during the MRI scan session, we monitored eye movements and collected eye position data. The results indicated that participants were able to maintain central fixation throughout the experiment. Across experiments and conditions, the deviation in eye position did not exceed 0.76° in either the horizontal or vertical direction. A two-way ANOVA showed no significant effect of SF and Category and no interaction between the two on the horizontal deviation of the eye (Fs < 1.3, ps > .2). There was a significant effect of Category on vertical eye deviation, F(5, 45) < 4.4, p < .01, but no significant effect of SF and no interaction between the two (Fs < 0.9, ps > .4). The effect of Category on vertical eye deviation was driven by slightly larger deviations for elephants than for chairs, faces, houses, and cars (ts > 2.5, ps < .05), with no difference between any other pairs of categories (ts < 2.6, ps > .09). This result could be due to the fact that the exemplars in the elephant category tended to cover a larger area than those in the other categories, although this result was not observed in our previous work (Vaziri-Pashkam & Xu, 2017). In any regard, when we removed elephants from the fMRI analysis, all the results remained qualitatively the same.
In this study, using fMRI MVPA, we examined how different SF components of an object image would affect object category representation in human early visual, ventral, and dorsal regions. Although different SF components of an object image did modulate fMRI response amplitude in some brain regions, it had no measurable effect on object category decoding in all the regions examined.
In early visual areas, the high-SF component of a natural object image elicited higher response amplitudes than the low-SF component of the image. Nevertheless, fMRI MVPA revealed equally strong object category decoding from both the high- and low-SF components of the object images. Comparison between within-SF and cross-SF object category decoding further revealed that the object category representations formed from high-SF and low-SF object image components did not differ from each other. Thus, despite differences in response amplitude, object category representations in early visual areas appear to be invariant (i.e., showing complete tolerance) to the different SF components of an image. Besides object category information, the difference between the high- and low-SF components of an object image can also be decoded. Using MDS and by directly comparing object category and SF component representations in V1 (a representative early visual area), we further showed that object category representations are clustered according to SF components, making SF component a key feature in determining the neural representational structure of early visual areas.
In ventral regions, as in early visual areas, the high-SF component of an object image also elicited higher response amplitudes than the low-SF component of the image. This replicated previous reports in ventral regions for visual objects (Fintzi & Mahon, 2013; Iidaka et al., 2004). However, MVPA decoding revealed equally strong object decoding for both the high- and low-SF components of the object images. Comparison between within-SF and cross-SF object category decoding revealed that object category representations in ventral regions exhibited a fair amount of tolerance to the different SF components of the image, but there were still differences between representations formed from the high- and low-SF components of the images. Besides object category information, the difference between the high- and low-SF components of an object image can also be decoded. This resulted in a significant but small shift in object category representation across the SF components as revealed by our MDS and decoding analyses in VOT (a representative ventral region). This shift, however, was significantly smaller in VOT than in V1. Consequently, there was still a large amount of overlap of object category representations from the high- and low-SF components in the representational structure. This indicated that object category, and not the SF component, played a more prominent role in determining the representational structure of ventral regions.
In dorsal regions, the high-SF component of an object image elicited higher response amplitude than the low-SF component of the image in only a few dorsal regions, with equally strong object decoding found for both the high- and low-SF components of object images in all dorsal regions. Although object category representations in lower dorsal regions showed incomplete tolerance across the different SF components (similar to those found in ventral regions), those in higher dorsal regions showed a complete tolerance across the different SF components (just as those found in early visual areas). The difference between the high- and low-SF components of an object image was represented only in some of the dorsal regions. This could be seen in the significant but small shift in object category representation across the two SF components in our MDS and decoding analyses in superior IPS (a representative dorsal region). This shift was comparable to that observed in VOT and was significantly smaller than that observed in V1. Consequently, as in VOT, there was still a large amount of overlap of object category representations from the different SF components in the representational structure in superior IPS. Thus, just like in ventral regions, object category, and not SF component, appeared to play a more prominent role in determining the representational structure of dorsal regions.
Overall, comparisons between ventral and dorsal regions showed that it was certainly not the case that the high- and low-SF image components were represented differently in these different brain regions. Although object category representations formed in early visual areas were separated according to the SF components of the images, in ventral and dorsal regions this separation was much smaller and resulted in object category representations largely overlapping across SF components. Given that both ventral and dorsal regions showed a fair amount of tolerance across the different SF components, this suggests that high-level visual representations, rather than low-level ones, were formed in these brain regions.
Despite the difference in the input that ventral and dorsal regions receive from magnocellular and parvocellular systems (Ferrera et al., 1994; Merigan & Maunsell, 1993; Livingstone & Hubel, 1988; DeYoe & Van Essen, 1985; Shipp & Zeki, 1985), there exist extensive anatomical connections among early visual, ventral, and dorsal regions in both the monkey and human brains, allowing rapid information exchange between these regions. In macaques, lateral intraparietal area is known to receive strong visual input from multiple visual areas, including early visual areas V2, V3, V3A, and V4; the middle temporal area MT; and inferior temporal areas TEO and TE (Lewis & Van Essen, 2000; Webster, Bachevalier, & Ungerleider, 1994; Felleman & Van Essen, 1991). Visual areas V1, V2, V3, and V3A are also reciprocally connected to area V6, which in turn is connected to IPS regions and V6A (Galletti et al., 2001). In humans, strong connections have been found between posterior IPS regions and extrastriate visual regions (Bray, Arnold, Iaria, & MacQueen, 2013; Greenberg et al., 2012; Mars et al., 2011). There also exists a major white matter pathway, the vertical occipital fasciculus, that connects hV4/VO-1 and V3A/V3B topographic maps (Takemura et al., 2015; Yeatman, Dougherty, Ben-Shachar, & Wandell, 2012). This pathway likely plays an important role in information transmission between ventral and dorsal regions regarding object form, identity, color, and location. Our finding of similar tolerance across the different SF components of an object image in both ventral and dorsal regions is consistent with these extensive anatomical connections between the two sets of brain regions. They are also in line with a recent review paper that argued that spatial frequency sensitivity may not be useful as a tool for dissociating the magnocellular and parvocellular systems or the ventral and dorsal streams when the stimuli are suprathreshold and acromatic (Skottun, 2015).
Previous studies reported that dorsal regions such as V3A exhibited higher response amplitudes for low- than high-SF images (Tootell & Nasr, 2017; Henriksson et al., 2008; Singh et al., 2000). Here, we did not replicate these findings. If anything, we found either no difference or an opposite effect in the dorsal regions. This is likely because previous studies used simple gratings whereas we used real-world object images. Further work is needed to explore this possibility.
Previous studies have also shown that task can modulate the representations in the dorsal stream (Bracci et al., 2017; Vaziri-Pashkam & Xu, 2017). It is thus possible that the SF independent object category representation in the dorsal pathway we observed here was evoked by our 1-back repetition detection task. However, if this was indeed the case, it would mean that the SF response function of the dorsal pathway could be modulated by a task that was completely orthogonal to SF. This is somewhat unlikely in our view. With our study, we can certainly rule out an account in which the dorsal pathway is wired to only represent the low-SF component of the visual input. Thus, the general claim that the dorsal pathway shows a preference for low-SF stimuli does not hold. Visual perception typically involves an observer attending to and extracting useful information from the visual environment. In this regard, our 1-back task captures several important aspects of visual perception. Even if the SF response function of the dorsal pathway can be modulated by task, the fact that it is SF independent in our 1-back task suggests that such modulation likely plays a relatively minor role in determining the contribution of the dorsal pathway to visual perception.
Although real-world objects never appear in the form of line drawings, we have no trouble recognizing line-drawing objects and relating these objects to their real-world counterparts. Line-drawing objects resemble the high-SF component of the images. The ability of early visual, ventral, and dorsal regions to tolerate changes across SF components likely mediates and supports our ability to process line-drawings.
To conclude, using fMRI MVPA, we found that early visual, ventral, and dorsal regions all showed robust object category decoding for the full spectrum and the high- and low-SF components of the natural and manmade object images used. Moreover, all of these regions also exhibited substantial tolerance in their object category representation across the different SF components of an object image despite their ability to discriminate different SF components. Whereas SF component was a prominent feature in determining the overall representational structure in early visual areas, it played a much smaller role in ventral and dorsal regions. In those regions, variations among the object categories more prominently shaped the representational structures. Our findings show that ventral and dorsal regions are similar in how they represent visual input from different spatial scales and argue against a dissociation of these regions based on differential sensitivity to the SF components of an image. This study joins a number of previous studies showing that the difference in visual representations between ventral and dorsal regions does not lie in their responses to different low-level features (Vaziri-Pashkam & Xu, in press; Konen & Kastner, 2008; Sawamura et al., 2005). But rather, as shown in our previous studies (Vaziri-Pashkam & Xu, 2017, in press), the ventral–dorsal distinction may be driven by the neural representational schemes used by the two pathways as well as how task and attention affect the representations. The collective evidence supports the distinctive roles these two pathways may play in the invariant versus adaptive aspect of visual information processing (Xu, in press).
We thank Katherine Bettencourt for her assistance in localizing the parietal topographic maps. This research was supported by NIH grant 1R01EY022355 to Y. X.
Reprint requests should be sent to Maryam Vaziri-Pashkam, Laboratory of Brain and Cognition, National Institute of Mental Health, Room 4C-205, 10 Center Drive, Bethesda, MD, 20814, or via e-mail: email@example.com.