In monocular pattern rivalry, a composite image is shown to both eyes. The patient experiences perceptual alternations in which the two stimulus components alternate in clarity or salience. We used fMRI at 3T to image brain activity while participants perceived monocular rivalry passively or indicated their percepts with a task. The stimulus patterns were left/right oblique gratings, face/house composites, or a nonrivalrous control stimulus that did not support the perception of transparency or image segmentation. All stimuli were matched for luminance, contrast, and color. Compared with the control stimulus, the cortical activation for passive viewing of grating rivalry included dorsal and ventral extrastriate cortex, superior and inferior parietal regions, and multiple sites in frontal cortex. When the BOLD signal for the object rivalry task was compared with the grating rivalry task, a similar whole-brain network was engaged, but with significantly greater activity in extrastriate regions, including V3, V3A, fusiform face area (FFA), and parahippocampal place area (PPA). In addition, for the object rivalry task, FFA activity was significantly greater during face-dominant periods whereas parahippocampal place area activity was greater during house-dominant periods. Our results demonstrate that slight stimulus changes that trigger monocular rivalry recruit a large whole-brain network, as previously identified for other forms of bistability. Moreover, the results indicate that rivalry for complex object stimuli preferentially engages extrastriate cortex. We also establish that even with natural viewing conditions, endogenous attentional fluctuations in monocular pattern rivalry will differentially drive object-category-specific cortex, similar to binocular rivalry, but without complete suppression of the nondominant image.
Multistable images provide important examples of dynamic conscious visual perception despite a static stimulus (Hupé & Rubin, 2003; Lee & Blake, 2002; Ramachandran & Anstis, 1985; Rubin, 1921; Necker, 1832). Perhaps the most prominent example of bistability is binocular rivalry, which has been studied extensively in recent years (Blake & Wilson, 2011; Tong, Meng, & Blake, 2006; Blake & Logothetis, 2002). In binocular rivalry, perception alternates between incompatible images (e.g., left and right oriented gratings) presented to each eye. At any given moment of time, one image may be dominant whereas the other is partly or completely suppressed (Blake & Logothetis, 2002). Monocular pattern rivalry is a related form of bistability, in which the two incompatible images are superimposed and identical images are shown to both eyes (Breese, 1899). The patient experiences alternations between different perceptual representations of the same image that are described as changes in clarity or salience of the two stimulus components in the composite image, as opposed to the near complete reduction in visibility that accompanies suppression in binocular rivalry (O'Shea, Parker, La Rooy, & Alais, 2009; Freeman, Nguyen, & Alais, 2005; Boutet & Chaudhuri, 2001).
A large number of studies have characterized the neural substrates of binocular or monocular pattern rivalry and have found that competition between visual representations occurs at different levels in the visual pathway (e.g., Buckthought, Jessula, & Mendola, 2011; Buckthought & Mendola, 2011; Brouwer, Tong, Hagoort, & van Ee, 2009; Raemaekers, van der Schaaf, van Ee, & van Wezel, 2009; Sterzer & Kleinschmidt, 2007; Brouwer, van Ee, & Schwarzbach, 2005; Haynes & Rees, 2005; Lee & Blake, 2002; Tong & Engel, 2001; Polonsky, Blake, Braun, & Heeger, 2000; Kleinschmidt, Buchel, Zeki, & Frackowiak, 1998; Lumer, Friston, & Rees, 1998; Tong, Nakayama, Vaughan, & Kanwisher, 1998). In the case of binocular rivalry, it has been shown that activation in early visual areas (V1, V2, V3; Lee & Blake, 2002; Tong & Engel, 2001; Polonsky et al., 2000) correlates with perceptual alternations of simple oriented grating patterns, consistent with models of interocular competition between (possibly monocular) orientation selective neurons (Wilson, 2007; Freeman, 2005). For example, when a high-contrast right-oriented grating is shown to one eye and a low-contrast left-oriented grating is shown to the other eye, then the BOLD signal in V1 increases during periods when the higher contrast grating is perceived and decreases when the lower contrast grating is perceived (Polonsky et al., 2000). Nevertheless, other psychophysical and neurophysiological evidence suggests that binocular rivalry also involves competition between high-level pattern representations, even if the patterns are intermingled in the two eyes (Haynes & Rees, 2005; Tong et al., 1998; Sheinberg & Logothetis, 1997; Kovacs, Papathomas, Yang, & Feher, 1996; Leopold & Logothetis, 1996; Logothetis, Leopold, & Sheinberg, 1996). The combined results indicate that neural competition in binocular rivalry is not limited to interocular inhibition but involves competition between image features. In a similar manner, monocular rivalry involves competition between the representation of the superimposed images features, but without the interocular inhibition and suppression present in binocular rivalry (Buckthought et al., 2011; O'Shea et al., 2009).
All these studies are relevant to the debate over the neural correlates of rivalry and have contributed to the current view that competition between alternate representations occurs at multiple levels in the hierarchy of visual cortical areas (Tong et al., 2006; Freeman, 2005; Meng & Tong, 2004; Wilson, 2003). Indeed, an fMRI study of binocular rivalry with face and house stimuli found that coordinated fluctuations in activation in human fusiform face area (FFA) and parahippocampal place area (PPA) accompanied perceptual alternations (Tong et al., 1998). In addition, recent psychophysical evidence suggests that the mechanisms underlying rivalry may differ across stimulus type (Sandberg, Bahrami, Lindelov, Overgaard, & Rees, 2011; Quinn & Arnold, 2010; van Boxtel, Alais, & van Ee, 2008; Alais & Melcher, 2007; Alais & Parker, 2006; Freeman, 2005). For instance, complex images such as faces can show deeper suppression of the nondominant image and more spatial coherence (less patchiness) in the dominant percepts than gratings (Alais & Melcher, 2007). It is thus important to directly compare perceptual rivalry for simple and more complex features (gratings and objects) physiologically to reveal if neural competition occurs at different processing levels in the visual system. An fMRI contrast between simple grating patterns and complex objects, using equivalent stimulus parameters and task demands, has not been reported.
In the present experiments, we make this comparison by using the monocular rivalry paradigm to manipulate competition between superimposed images features, but without the interocular inhibition present in binocular rivalry. We also employed a novel nonrivalrous control stimulus (Figure 1B) that serves to isolate the brain activity related to rivalry per se, as well as the perceptual image segmentation that enables it (Metelli, 1970). Participants either viewed the rivalry passively or performed a task to measure alternation rates. We used both simple (orthogonal gratings) and more complex stimuli (face/house composites; Figure 1C, D). Colored stimuli were used to enhance the percept of monocular rivalry (O'Shea et al., 2009; Kitterle & Thomas, 1980), and mean luminance and contrast were matched across all stimuli.
One author (A. B.) and five participants who were naive as to the hypotheses of the study took part in all experiments. The volunteers (which included two women) were university students or postdoctoral fellows. All were right-handed and had normal or corrected-to-normal acuity and stereoacuity thresholds better than 30 sec arc, measured using the Titmus stereo test (Stereo Optical Co., Chicago, IL). The participants provided informed written consent and were remunerated for their time. The experiments were approved by the research ethics board of the McGill University Health Centre (Protocol NEU-08-03).
All stimuli were presented on a MacBook Pro Laptop (Intel Core 2 Duo) Macintosh computer with 1024 × 768 resolution, 120 Hz refresh rate with 8 bit/pixel grayscale, which was gamma-corrected using a color look-up table. After calibration, the stimulus had a mean luminance of 30 cd/m2 and a peak luminance of 60 cd/m2. Stimuli were generated and displayed using Matlab (2007b) and Psychtoolbox Version 3 (PTB-3) software and a Matrox (Dual Head 2Go Analogue Edition, Dorval, Quebec, CA) splitter graphics card. Dual LCD (InFocus LP 540, Dorval, Quebec, CA) projectors and linear polarizers were used for dichoptic projection (Thompson, Farivar, Hansen, & Hess, 2008). The participants wore linear polarizers with complementary polarization on their eyepieces. Note that dichoptic presentation was not necessary to show monocular rivalry stimuli, but it was used to present binocular rivalry stimuli in the same scans (results not discussed in this article). For the conditions presented here, identical images were presented to each eye. The stimuli were back-projected from an LCD projector onto a screen at the rear end of the MR scanner bore at a viewing distance of 134 cm, and participants viewed stimuli through a mirror attached to the head coil. The same display equipment and conditions were used both for fMRI scan sessions and psychophysical sessions, including the same viewing distances. Throughout the experiments, each stimulus was projected through an opaque rectangular aperture (5.1° height × 3.7° width), which minimized any edge disparities.
The face stimuli were produced from a database of grayscale photographs, provided by Hugh R. Wilson (Loffler, Yourganov, Wilkinson, & Wilson, 2005). The photographs of houses were provided by a local building company (Les Immeubles Mar-vo, Inc., Fabreville, Laval, Quebec). The house stimuli were chosen to have the same aspect ratios as the faces. Each house stimulus was resized slightly to match the dimensions of the face in the face/house composite pair. The houses and faces were initially equated for mean luminance and 80% RMS contrast, then converted to two-tone images, with the same proportion of pixels at the high and low luminance (Figure 1), and equated for Michelson contrast and mean luminance. Each house or face was 3.0° (width) × 4.0° (height).
Oblique Left/Right Grating Stimuli
The oblique left/right gratings were sinusoidal grating stimuli converted to two-tone images with the same proportion of pixels at the high and low luminance as in the face or house stimuli, thus equating them for Michelson contrast and mean luminance. Orthogonal orientations were used (45°, −45°; 60°, −30°; and 75°, −15°).
Control Condition for Monocular Rivalry
A control condition for the monocular rivalry grating stimulus was created by increasing the luminance of half of the X-junctions (with a gray color), above the luminance of the red and green orthogonal grating contours to violate Metelli's law for transparency (Metelli, 1970) (Figure 1B). The other half of the X-junctions were reduced in luminance by the same amount. This made the monocular rivalry alternations hardly perceptible. This manipulation also had the effect of increasing the contrast of the image slightly (i.e., 18–21%), but the effects on the alternations could not be attributed to the slight increase in contrast (because alternations would be expected to speed up slightly, not slow down with an increase in contrast).
Both the grating and face/house stimuli were colored red/green to enhance the perception of monocular rivalry and were presented on a yellow background at 18% contrast. In all of the psychophysical tests and fMRI runs, color-counterbalanced versions of the stimuli were used (i.e., each stimulus component could be red or green). Red–green isoluminance was confirmed for each subject individually using a minimum motion technique (Cavanagh, Tyler, & Favreau, 1984) for gratings viewed binocularly with the same mean luminance and chromaticity as in the main experiment; none of the subjects required different luminances for the red and green gratings.
Alternation rates were measured for monocular rivalry with the left/right oblique grating and face/house stimuli. Participants reported perceptual alternations continuously over each 90-sec trial, alternately pressing two different keys. Subjects pressed the first key once a particular stimulus component appeared to be at least twice as clear as the other or was exclusively visible over at least two thirds of the stimulus (the same criterion for visibility as used by O'Shea et al., 2009), or if a composite was perceived (i.e., neither component was more prominent). Participants pressed the second key once the other stimulus component appeared to be at least twice as clear or exclusively visible over at least two thirds of the stimulus, using the same criterion.
Suppression Test (Visibility)
The monocular rivalry stimulus was presented on the left side of the screen at 18% contrast (reference condition). An image on the right side of the screen (test condition) displayed only one stimulus component (e.g., the house only in the case of the face/house composite). The participants adjusted the contrast of the test image until it matched the apparent contrast in the reference image when that component appeared to have the lowest contrast during monocular rivalry alternations. Participants were tested twice for each stimulus condition.
Acquisition of fMRI Data
All images were acquired using a 3-T MR scanner (Siemens, Trio, Germany) at the Montreal Neurological Institute, with a 32-channel head coil (20 channels for retinotopic mapping). Functional whole-brain images were acquired using a T2*-weighted gradient-echo, EPI sequence (38 slices, repetition time [TR] = 2500 msec, echo time [TE] = 30 msec, field of view [FOV] = 192, voxel size = 3 × 3 × 3 mm). Functional images for retinotopic mapping were acquired with a T2*-weighted sequence, with slices oriented perpendicular to the calcarine sulcus (28 slices, TR = 2000 msec, TE = 30 msec, FOV = 128, voxel size = 4 × 4 × 4 mm). Anatomical images were acquired by using a T1-weighted magnetization-prepared rapid gradient-echo sequence optimized for contrast between gray and white matter (176 slices, TR = 2300 msec, TE = 2.98 msec, FOV = 256, voxel size = 1 × 1 × 1 mm).
Monocular Rivalry Scans (Active Condition)
A block design was used, composed of 30 sec stimulus blocks. The first half of each scan consisted of blocks alternating between binocular and monocular rivalry in ascending contrasts (9%, 18%, 36%). Only the 18% contrast monocular rivalry condition is further discussed in this article; the results with binocular rivalry have been published previously (Buckthought et al., 2011). Participants used a button box to report when their dominant percept switched to that of a left oblique or right oblique grating (or face/house; following the procedure described above). During the second half of each scan, additional stimulus conditions were presented (rivalry replay, results not discussed here). Finally, the first and last block of each run were blank baseline blocks, so that there were 14 blocks, for a total length of 420 sec. Each volunteer participated in four runs with the grating stimuli and four runs with the face/house stimuli. The runs were counterbalanced for color (e.g., the colors of the left and right oblique gratings or faces and houses were interchanged). Three sets of face/house stimuli were used, and the left/right oblique gratings were presented at three different orthogonal orientations (45°, −45°; 60°, −30°; and 75°, −15°).
Passive Viewing Scans
In the passive viewing scans, a block design was used, composed of 20-sec stimulus blocks. The monocular rivalry stimuli were identical to those in the scans with a task, but only gratings were used, at 18% contrast. In addition, a control condition for monocular rivalry was used in which the X-junctions were modified to violate Metelli's law for transparency, as shown in Figure 1B (Metelli, 1970). The scans also included three other stimulus conditions not discussed in the current article (binocular rivalry as well as left and right oblique gratings). Four repetitions of each of these five block types were shown, and the first and last blocks of each run were blank baseline, for a total of 440 sec. Each subject participated in two of these runs, which were counterbalanced for color (i.e., the colors of the left and right oblique gratings were interchanged).
Face/House Localizer Scans
Alternating sequences (lasting 20 sec) of nonrivalrous faces or houses were presented in separate blocks (500 msec per stimulus) to functionally localize each participant's FFA and PPA. To maintain attention, participants had to perform a 1-back task, that is, indicate by a button press when two successive identical stimuli happened at variable time points once per block. Each participant was tested with one run in which the original grayscale photographs of the faces and houses were used and a second run in which the red or green two-tone versions were used. The results obtained were similar, so these two runs were averaged to define the FFA and PPA.
Retinotopic Mapping and MT+ Localizer Scans
Retinotopic mapping was carried out in a separate session. The stimuli for retinotopic mapping consisted of high contrast, chromatic, flickering checkerboard patterns of two specific types. The rotating wedge stimulus swept through polar angles, and the expanding/contracting ring stimulus mapped eccentricity. There were four acquisition runs for each subject, comprising eccentricity (fovea to periphery and vice versa) and polar mapping (clockwise and counterclockwise) runs. The polar mapping runs consisted of eight cycles (full hemifield rotation of both wedges), lasting a total of 512 sec. The eccentricity mapping runs consisted of eight cycles of expanding or contracting rings, lasting a total of 512 sec. Both stimuli attempted to compensate for the cortical magnification factor by increasing in size as they approached the periphery, and the eccentricity stimuli traversed space with a logarithmic transformation (Sereno et al., 1995). A central fixation marker was present at all times, and subjects were required to perform a task monitoring the orientation of the fixation marker to aid fixation stability. In addition, participants performed two runs of MT+ localization (256 sec) consisting of eight 16-sec epochs of low-contrast stationary rings and eight 16-sec epochs of moving rings (Tootell et al., 1995).
We used the BrainVoyager QX analysis package, version 188.8.131.520 (Brain Innovations, Maastricht, The Netherlands) for most functional data analyses as well as for the creation of inflated and flattened cortical representations. The freely available Freesurfer analysis package, version v4.5.0 (surfer.nmr.mgh.harvard.edu/), was found to be better for retinotopic mapping data analysis on the reconstructed inflated brain, and the identified retinotopic areas were transferred to BrainVoyager using anatomical landmarks.
The anatomical and functional scans were analyzed in BrainVoyager using the standard processing sequence in this software package, described as follows. The anatomical scans were used to create surface reconstructions of each subject's cerebral cortex. The computed cortical surface representation was inflated and then flattened. Each participant's reconstructed folded cortical representation was normalized to spherical coordinate space and aligned to a target brain (chosen as an individual participant) using cortex-based alignment (Frost & Goebel, 2012). The cortex-based alignment was performed to obtain a good match between corresponding brain regions for the group level statistical data analysis. Before analysis of the functional scans, the first two volumes of every scan were discarded. All functional images were subjected to a standard set of preprocessing steps: (1) 3-D motion correction, (2) slice timing correction, and (3) removal of low frequencies up to three cycles per scan (linear trend removal and high-pass filtering). No spatial smoothing was applied. Functional data were manually coregistered with the 3-D anatomical T1 scans. The 3-D anatomical scans were transformed into Talairach coordinate space using trilinear interpolation (Talairach & Tournoux, 1988), and the parameters for this transformation were subsequently applied to the coregistered functional data. A voxel-by-voxel, fixed effects general linear model was used for analysis, estimating the neural response as a boxcar function for each condition block, convolved with a standard hemodynamic response function (sum of two gamma functions). For the active scans, monocular rivalry was the predictor (along with binocular rivalry and replay not presented here). For the passive scans, monocular rivalry and the nonrivalrous control were the relevant predictors. For both scan types, the functional results were then viewed on an individual's cortical surface, producing maps of statistical significance (t tests corrected for multiple comparisons using the false discovery rate [FDR] method, p < .05), which were then spatially smoothed. In addition, we separately analyzed the BOLD signal changes within ROIs, using a fixed effects general linear model analysis.
The standard phase-lag analysis procedure was used to localize early visual areas (Sereno et al., 1995). To map visual space to corresponding cortical regions, cross-correlation analysis was used to identify the time lag at which a region responds maximally. These statistical maps were corrected for multiple comparisons using the FDR method (p < .05). Polar angle maps showed transitions in the color scale representing adjacent angles of a checkerboard wedge, visualized on the flattened cortical surface. Because early visual areas border each other with a mirrored representation of the visual field at the horizontal and vertical meridians, the boundaries of visual areas (V1, V2, V3, and V3A) were chosen by visual inspection at the turning points of the color scale. Foveal ROIs for V1, V2, and V3 were defined as the region of occipital pole (OP) activated in the central 2.9° of visual angle using the gradation of the color scale on the eccentricity maps.
FFA, PPA, and MT+ ROIs
The data from face/house localizer scans were analyzed using a fixed effects general linear model, estimating the neural response as a boxcar function for each condition block, convolved with a standard hemodynamic response function (sum of two gamma functions). The face and house stimulus conditions were predictors. Linear contrasts were used to define the FFA as the region of contiguous voxels in the midfusiform gyrus that responded significantly more to faces than houses and the PPA as the region of contiguous voxels in the parahippocampal gyrus that responded significantly more to houses than faces. The t-test thresholds that were used were the highest value, which would label the FFA and PPA for all participants in both hemispheres in accordance with locations in previous studies (t = 5.4, p < .0001 for both FFA and PPA). The Talairach coordinates for these regions corresponded well to previously reported locations for face-selective or house-selective areas (mean coordinates for all six participants were as follows: FFA −37, −50, −17 and 39, −48, −15; PPA −26, −46, −6 and 27, −43, −5). The data from the MT+ localizer scans were analyzed in a similar manner, but with moving and stationary stimuli as predictors. A linear contrast was used to define a cluster of contiguous voxels that responded more strongly to moving stimuli than stationary stimuli, located at the posterior end of the inferotemporal sulcus. A region was labeled in all six participants in accordance with locations for this area found in previous studies (t = 7.3, p < .0001, mean Talairach coordinates for all participants were −45, −71, 1 and 45, −70, 2).
In addition to the ROIs defined for each individual participant by functional localizer scans (V1, V2, V3, V3A, FFA, PPA, and MT+), another set of ROIs were defined using the group average map for the subtraction of the nonrivalrous control from monocular rivalry in passive viewing. The functional results of the subtraction were viewed on an individual's cortical surface, producing maps of statistical significance (t tests corrected for multiple comparisons using the FDR method, p < .05), and individual ROIs were selected as distinct contiguous clusters of voxels. This was done to identify the whole-brain rivalry network for subsequent comparison of the grating rivalry task to the object rivalry task. Ten ROIs were defined in this way and were labeled as follows: VO = ventral occipital, DO = dorsal occipital, VT = ventral temporal, MTS = medial-temporal sulcus, SP = superior parietal, TPJ = temporoparietal junction, SMA = supplementary motor area, LF = lateral prefrontal, LFa = anterior lateral prefrontal, and OF = orbital frontal. An additional ROI labeled as OP was defined using the subtraction of baseline from the nonrivalrous control stimulus (t tests corrected for multiple comparisons using the FDR method, p < .05).
Psychophysical testing was carried out before the fMRI sessions to characterize the stimuli and choose appropriate stimulus parameters for scanning. Both the grating and object composites were readily perceived as bistable, although the rivalry alternations were somewhat slower for the object stimuli than the grating stimuli. A paired t test revealed that these differences were statistically significant, t(df = 5) = 3.22, p < .05 (Figure 1E). The alternation rates with the control image for monocular rivalry (Figure 1B) in which alternations are much harder to perceive are not shown. The subjects made only a few key presses with this type of image over all of the psychophysical sessions, a factor of 10 fewer than with any of the other rivalry test images.
We next used a contrast adjustment task to provide a measure of the apparent contrast of a stimulus component when it was minimally visible during alternations, thus providing a measure of suppression. In this case, the results of the suppression test were similar for objects and grating stimuli, and a paired t test revealed that the differences were not statistically significant, t(df = 5) = 0.416, p > .05 (Figure 1F). Both types of stimuli were perceived at about 50–40% of the maximal physical contrast. We note that this test of suppression emphasizes visibility, rather than measuring sensitivity via the detection of a test probe added to the nondominant image, as was done in some previous studies (Freeman et al., 2005), so that likely explains any differences. Overall, these psychophysical results validate our use of the monocular rivalry stimuli in the fMRI scans and the control condition in which the monocular rivalry alternations were hardly perceptible.
fMRI Comparison of Nonrivalrous Control to Baseline
The first fMRI comparison was between the monocular rivalry control stimulus, in which the X-junctions were modified to reduce the perception of alternations (Metelli, 1970) and the fixation-only baseline (blank screen). The results, averaged for all six subjects, are shown in Figure 2A. The activation for the nonrivalrous stimulus was confined to posterior occipital cortex (labeled as OP). The retinotopic mapping for each participant confirmed that this activation included early visual areas V1 and V2, but not V3 or V3A. These results demonstrate the rather limited network recruited in our subjects by the simple stimulus features (e.g., orientation, color) without the influence of a bistable percept or task.
fMRI Comparison of Passive Monocular Rivalry to Nonrivalrous Control
Next, a comparison of passive viewing of the grating rivalry stimulus to the nonrivalrous control stimulus, averaged for all six participants, is shown in Figure 2B. The activation for the monocular rivalry stimulus engaged a whole-brain network. Regions of activation included the ventral occipital region (VO; likely including V4), dorsal occipital region (DO; including V3 and V3A), medial-temporal sulcus (MTS; which did not include the motion selective area MT+), ventral temporal region (VT), superior parietal region (SP), TPJ, SMA, lateral prefrontal region (LF), anterior lateral prefrontal (LFa), and orbital frontal region (OF). This widespread activation occurred despite the fact that participants did not perform a key press task (see next section). The results of this stimulus-based subtraction also confirmed that, although there was some lateral occipital cortex activation, this did not include MT+, despite the fact that this area was engaged by binocular rivalry of similar stimuli (Buckthought et al., 2011). As noted in the Methods section, the areas identified in this subtraction were used to create ROIs that represent the whole-brain rivalry network. We could thus use this defined network to query the BOLD signal in each region during either the grating or object rivalry tasks using an independent data set (next section). We also considered it an advantage that this network defined via passive viewing does not explicitly represent the motor aspects of any task, as it is the sensory components we focus on here.
Direct Comparison of Grating Rivalry Task and Object Rivalry Task
When the grating rivalry task and the object rivalry task were each compared with baseline using linear contrasts, the overall pattern of cortical activation was similar, but some differences were visible. There was greater activity in ventral temporal regions for object rivalry, whereas there was greater activity in frontal cortex for grating rivalry. When interpreting greater activation for grating rivalry than object rivalry, it is appropriate to consider the alternation rates that subjects experienced in the scanner. The mean alternation rate for grating rivalry across all subjects (0.402) was above that for object rivalry (0.228). This is equivalent to 12.1 versus 6.84 key presses in a 30-sec block, and a paired t test of the key presses data revealed that these differences were statistically significant, t(df = 5) = 6.03, p < .05. Therefore, we attribute the latter finding to the higher rate of perceptual alternation and associated button responses. To better quantify these effects and compare the two tasks directly with the best sensitivity, we relied on ROI analysis, that is, the 10 ROIs representing the whole-brain passive rivalry network and an additional ROI defined using the rivalry control stimulus (OP). This set of areas should reflect the processing of rivalry per se (or stimulus features, in the case of OP) and was independently defined. However, given that our passive rivalry scan used grating stimuli, there could possibly be a small systematic bias in favor of the grating rivalry task.
When the grating and object rivalry tasks are directly compared using linear contrasts, the results show that only two regions, the OP and the OF cortex were significantly higher for the grating rivalry task (Figure 3A; OP: t = 2.80, p < .01; OF: t = 4.18, p < .01, evaluated in a fixed effects model corrected for multiple comparisons, FDR, df = 7440). Conversely, several extrastriate ROIs, DO, VO, and VT showed a significantly higher signal for the object rivalry task (DO: t = 3.52, p < .01, VO: t = 2.85, p < .01, and VT: t = 2.61, p < .01; the remaining ROIs were not significant: MTS: t = 0.608, SP: t = 0.15, TPJ: t = 1.55, SMA: t = 1.14, LF: t = 0.58 and LFa: t = 0.33, p > .05 in all cases). The results were very similar for both the left and right hemispheres, so the data were combined across hemispheres. These results were obtained despite the modest biases for grating rivalry that would be expected from the above-mentioned confounds and are consistent with the predictions made by some authors that rivalry with complex images occurs in higher-level visual areas (Freeman, 2005; Freeman et al., 2005; Wilson, 2003; see Discussion).
To take advantage of the retinotopically and functionally specific ROIs and better characterize these results, we also computed the same subtraction of object rivalry task minus grating rivalry task for the more precisely defined visual cortical areas. The results (Figure 3B) echo those shown in Figure 3A, but clarify that the bias in favor of the object rivalry task includes V3, V3A, FFA, and PPA (V3: t = 5.58, p < .01, V3A: t = 6.31, p < .01, FFA: t = 3.97, p < .01, and PPA: t = 7.13, p < .01). In contrast, V1 and V2 show a bias in favor of the grating rivalry task (V1: t = 3.45, p < .01 and V2: t = 3.01, p < .01).
Direct Comparison of Face and House Periods for Object Rivalry Task
Our final comparison focused specifically on the object rivalry task. Given that FFA and PPA are defined according to the preference for object categories, the question arises if the BOLD signal level fluctuates during rivalry in accordance with the perceptual dominance experienced by each subject. This can be tested by using the key presses that each subject provided during scanning and has indeed been demonstrated previously for binocular rivalry when a face is shown to one eye and a house is shown to the other (Tong et al., 1998). Similarly, we subtracted the BOLD signal during periods in which the house was dominant from the signal during periods of face dominance, for each subject, for our visual cortical ROIs. The results showed that only the FFA was significantly more active for face periods, and only the PPA was significantly more active for house periods (FFA: t = 2.68, p < .01 and PPA: t = 3.21, p < .01, evaluated in a fixed effects model corrected for multiple comparisons, FDR, df = 3995, while the following ROIs were not significant: V1: t = 0.293, V2: t = 0.284, V3: t = 0.345, and V3A: t = 0.938, p > .05 in all cases). This effect was found for both the left and right hemispheres, so average data is shown (Figure 4).
In this study, we used fMRI to characterize the neural substrates for monocular rivalry more completely than previously possible. In fact, the only previous fMRI study of monocular rivalry was our own (Buckthought et al., 2011) in which we directly compared monocular rivalry with the much better studied binocular rivalry (see also Buckthought & Mendola, in press, for a broader review of bistable perception). In this report, we focused on comparisons between well-matched stimuli to explore the selectivity of regions in the network for different aspects of monocular rivalry. We provide here a novel demonstration that the pattern of activation during rivalry is dependent on stimulus complexity by direct contrast of comparable stimuli. Although our sample size and fixed effects statistics do necessitate caution in generalized extrapolation, our results fit well with the conclusions from previous studies.
We demonstrated that relatively slight changes in a stimulus that are consistent with ambiguous border ownership and perceptual segmentation result in a dramatic expansion of the brain areas activated, even when no task is performed. Our stimuli were very well matched in terms of luminance, color, and contrast, having the same number of red, green, and yellow pixels. To create the nonrivalrous control stimulus, we specifically took advantage of the nature of X-junctions in the image. Small modifications of the contrast polarity of the junctions preserved mean luminance but were not consistent with the perceptual organization of two transparently overlaid surfaces. This has been explored previously by others and well characterized using psychophysics (Adelson, 2000; Anderson, 1997). We consider these results consistent with previous psychophysical studies that have come to the conclusion that monocular rivalry results from competition in early visual areas (van Boxtel, Knapen, Erkelens, & van Ee, 2008; Pearson & Clifford, 2005; Campbell, Gilinsky, Howell, Riggs, & Atkinson, 1973; Campbell & Howell, 1972), with mechanisms similar to those involved in transparency, as the perception of alternations can be reduced by violating Metelli's law (Metelli, 1970). Like binocular rivalry, monocular rivalry depends upon low-level stimulus characteristics, such as spatial frequency, orientation, contrast, color, and image size (O'Shea et al., 2009; van Boxtel, Knapen, et al., 2008; Pearson & Clifford, 2005; Campbell et al., 1973; Campbell & Howell, 1972).
This network that we defined for monocular rivalry, activated even with no task, still included activity in fronto-parietal areas that are often implicated in attention. It has long been proposed that a common mechanism of attentional selection is involved in all forms of multistability (von Helmholtz, 1866/1924). According to this view, frontal and parietal areas responsible for attentional control initiate perceptual alternations by sending top–down signals to bias activity in visual cortex toward one representation or another (e.g., Britz, Pitts, & Michel, 2011; Sterzer & Kleinschmidt, 2007; but see Knapen, Brascamp, Pearson, van Ee, & Blake, 2011). Indeed, the most prominent sites of activation here match those in previous studies of binocular rivalry (Brouwer et al., 2005, 2009; Lumer et al., 1998), filling-in (Mendola, Conner, Sharma, Bahekar, & Lemieux, 2006), apparent motion (Sterzer & Kleinschmidt, 2007; Sterzer, Russ, Preibisch, & Kleinschmidt, 2002), structure from motion (Raemaekers et al., 2009), and ambiguous figures, including the Rubins face/vase, old woman/young woman (Kleinschmidt et al., 1998), and Necker Cube (Slotnick & Yantis, 2005; Inui et al., 2000).
Specifically, we observed fronto-parietal activation in SP cortex, TPJ, dorsolateral pFC, and in an OF region that may include ventrolateral pFC (VLPF). Thus, the results included both the ventral network of attention (TPJ and VLPF) as well as the dorsal network including SP, intraparietal sulcus, and dorsolateral pFC (Corbetta, Patel, & Shulman, 2008; Kincade, Abram, Astafiev, Shulman, & Corbetta, 2005; Corbetta, Kincade, Ollinger, McAvoy, & Shulman, 2000). The dorsal attention network is likely recruited here, as it selects and links stimuli and responses. In contrast, the VLPF is thought to become active when attention is reoriented to a new stimulus or object of interest, interrupting ongoing selection in the dorsal network, which in turn shifts attention toward the novel stimulus. Given that rivalry consists of a repeating cycle of interrupted attention between the perceptual alternatives, the idea of a cycle of activity in these two attention networks is attractive. Also potentially relevant is recent work on the orbital frontal cortex that suggests a role in perceptual coherence and decision-making (Kahnt, Chang, Park, Heinzle, & Haynes, 2012; Volz, Rübsamen, & von Cramon, 2008; Kringelbach, 2005).
Overall, we found that the pattern of activation for the grating rivalry task and the object rivalry task was similar. However, biases in favor of gratings were shown in a subset of activated regions, using ROI analysis. We are cautious about the interpretation of the preference for grating rivalry in V1 and V2 (and OF), given that (as noted in Results) two methodological factors could account for this result. Not only did we use ROIs that were derived from passive viewing of grating rivalry stimuli, but also the rivalry alternation rate was modestly, but significantly, higher for the grating than the object rivalry task. A higher rivalry rate would mean that the subject experienced a larger number of perceptual alternations and performed more button presses. Nevertheless, we will mention that our emphasis here on monocular rivalry rather than binocular rivalry had the advantage of removing the possibility that it is the presence of monocular neurons that would contribute a bias for simple oriented gratings in V1. Rather, a high selectivity for extended, collinear-oriented features (or the confounds listed before) are more likely explanations.
On the other hand, we suggest that the dissociation in which several extrastriate areas favored object rivalry is more meaningful in this context. The finding that V3, V3A, FFA, and PPA were more active during object rivalry than grating rivalry is despite any systematic bias in the opposite direction. These results are entirely consistent with the network of areas identified as important in object/scene recognition in previous studies. In particular, V3/V3A and a region just anterior, referred to as transverse occipital sulcus, have consistently shown strong object selectivity that is invariant across different cues (Grill-Spector, Kushnir, Edelman, Itzchak, & Malach, 1998), as well as high selectivity for complex scenes (Nasr et al., 2011; MacEvoy & Epstein, 2007). FFA and PPA are also particularly prominent in this network, although both were defined here a priori according to face or house preference, so a bias during rivalry might be expected. Interestingly, it has been suggested before that slower rivalry rates, as we saw in the case of objects versus gratings, are actually a marker for rivalry that recruits higher level visual areas more heavily (Freeman, 2005; Freeman et al., 2005; Wilson, 2003). The current results are consistent with that hypothesis, as well as with other recent psychophysical dissociations between grating and object stimuli in binocular rivalry (Quinn & Arnold, 2010; van Boxtel, Alais, et al., 2008; Alais & Melcher, 2007).
We were further able to demonstrate that during object rivalry the activity in FFA and PPA tracked the participant's percept. This finding fits readily into current concepts of biased competition of visual attention (Desimone & Duncan, 1995) and the idea of common mechanisms of attentional control during ongoing awareness (Leopold & Logothetis, 1999). We are not in a position to distinguish between different types of attention, such as feature based (e.g., color; Corbetta, Miezin, Dobmeyer, Shulman, & Petersen, 1990), object based (face/house), or surface based (Ciaramitaro, Mitchell, Stoner, Reynolds, & Boynton, 2011). Our stimuli were not designed for explicit manipulation of surface perception, although we note in passing that under ideal viewing conditions stimulus components can reside perceptually on different surfaces with some implied depth. Object-based attention has been studied in previous reports using faces and houses transparently superimposed (Serences, Schwarzbach, Courtney, Golay, & Yantis, 2004; O'Craven, Downing, & Kanwisher, 1999). Both studies found that the BOLD signal increases in FFA or PPA when participants are instructed to attend to either the face or house. However, in these studies dynamically moving, achromatic, grayscale images were used, so rivalry was likely quite weak and was not measured. Instead, participants used voluntary attention to monitor either the face or the house according to instructions (see also Sreenivasan, Goldstein, Lustig, Rivas, & Jha, 2009; Furey et al., 2006). Our results complement those findings by showing that endogenously controlled alternations of a bistable static image are also reflected in fluctuations in category-specific cortex.
In addition, the dissociation between FFA and PPA during monocular rivalry is also a replication of a similar finding in the case of binocular rivalry (Tong et al., 1998). We will note that, in both cases, the face and house stimuli were colored differently to facilitate the rivalry perception, and it is known that color alone can be a modest cue to drive rivalry (Hong & Shevell, 2008; Holmes, Hancock, & Andrews, 2006; Kitterle & Thomas, 1980). However, it is object selectivity, not color, that determines the response profile of these areas. Moreover, these results are fully consistent with the previous finding that the FFA (but not object-selective regions of the lateral occipital cortex or the parahippocampal gyrus) showed a BOLD signal increase when subjects perceived the face (rather than a collections of blobs) in thresholded, Mooney-type face stimuli such as ours (Andrews & Schluppeck, 2004; see also Boutet & Chaudhuri, 2001). In this report, we did not attempt the analogous demonstration for the grating rivalry task of fluctuating populations of orientation-selective neurons in V1, because of the high spatial resolution needed and the fast rate of rivalry. Nevertheless, this has been demonstrated recently for binocular rivalry using multivoxel pattern analysis to predict which of two (rotating) orthogonally oriented gratings was perceptually dominant (Haynes & Rees, 2005). Whether the same result would hold for monocular rivalry, where no segregation of monocular neuronal responses would be expected, is an interesting question for the future.
By focusing on the neural substrates of monocular rivalry in these experiments, we might be expected to generate data that would be relevant to the ongoing debate about which levels of the visual hierarchy are responsible for coding the neural competition that correlates with perceptual alternations. Models of binocular rivalry include competition at multiple levels of the visual hierarchy. At the lowest level, binocular rivalry may involve interocular inhibition between monocular neurons in V1 (“eye-based rivalry”; Wilson, 2007; Freeman, 2005). At higher levels, binocular rivalry also involves competition between the representation of different features (“stimulus” or “pattern rivalry”) that may be combined across the eyes (Wilson, 2003; Sheinberg & Logothetis, 1997; Kovacs et al., 1996; Logothetis et al., 1996). It can be assumed that monocular rivalry does not engage competition between monocular neurons given that dichoptic stimulation is not used. In other words, monocular rivalry is by definition a type of “pattern rivalry,” in which no interocular competition is present. It is thus notable that we obtained evidence for neural competition between stimulus alternatives at both low (V1, V2) and higher (V3, V3A, FFA, and PPA) levels, depending on the specific stimulus. This is consistent with neural competition at lower levels occurring between simple stimulus features (well described by oriented gratings), whereas competition at higher levels occurs between progressively more complex features (such as face/house). These results add weight to the emerging consensus for multilevel models of bistable perception (Wilson, 2007; Tong et al., 2006; Freeman, 2005; Pearson & Clifford, 2004). In conclusion, we provide further evidence for functionally specific neural competition occurring at different levels in the visual system and confirm that activity in higher-level areas closely reflects the subject's moment-to-moment perception during monocular rivalry alternations.
This work was supported by NSERC and NIH R01 EY015219 grants as well as a LOF grant from the Canadian Foundation for Innovation. We thank Hugh R. Wilson for providing the face stimuli used in the experiments and Robert F. Hess for providing MR-compatible equipment for dichoptic stimulation.
Reprint requests should be sent to Dr. Janine Mendola, McGill Vision Research Unit, Royal Victoria Hospital, 687 Pine Avenue West, Room H4-14, Montreal, Quebec H3A 1A1, Canada, or via e-mail: email@example.com.