The nature of the building blocks of information in visual working memory (VWM) is a fundamental issue that has not been well resolved. Most researchers take objects as the building blocks, although this perspective has received criticism. The objects could be physically separated ones (strict object hypothesis) or hierarchical objects created from separated individuals (broad object hypothesis). Meanwhile, a newly proposed Boolean map theory for visual attention suggests that Boolean maps may be the building blocks of VWM (Boolean map hypothesis); this perspective could explain many critical findings of VWM. However, no previous study has examined these hypotheses. We explored this issue by focusing on a critical point on which they make distinct predictions. We asked participants to remember two distinct objects (2-object), three distinct objects (3-object), or three objects with repeated information (mixed-3-object, e.g., one red bar and two green bars, green bars could be represented as one hierarchical object) and adopted contralateral delay activity (CDA) to tap into the maintenance phase of VWM. The mixed-3-object condition could generate two Boolean maps, three objects, or three objects most of the time (hierarchical objects are created in certain trials, retaining two objects). Simple orientations (Experiment 1) and colors (Experiments 2 and 3) were used as stimuli. Although the CDA of the mixed-3-object condition was slightly lower than that of the 3-object condition, no significant difference was revealed between them. Both conditions displayed significantly higher CDAs than the 2-object condition. These findings support the broad object hypothesis. We further suggest that Boolean maps might be the unit for retrieval/comparison in VWM.
As one of the most critical information-processing modules used by human beings, visual working memory (VWM) has received attention from multiple research fields in the last decade (see Baddeley, 2012; Fukuda, Awh, & Vogel, 2010; Luck & Hollingworth, 2008, for reviews). So far, using various methods (e.g., behavioral, ERP, fMRI, magnetoencephalography [MEG], mathematical modeling), researchers have examined the mechanisms of VWM extensively. For instance, VWM capacity has been examined in children, young adults, elderly adults, dysfunctional people, monkeys, and rats (e.g., van den Berg, Shin, Chou, George, & Ma, 2012; Jost, Bryck, Vogel, & Mayr, 2011; Cowan, Morey, AuBuchon, Zwilling, & Gilchrist, 2010; Lee et al., 2010; Bays & Husain, 2008; Xu & Chun, 2006; Ragozzino, Detrick, & Kesner, 2002). However, the nature of the building blocks of visual information in VWM is a fundamental question that has not been well resolved. In the current study, we investigated this question by distinguishing two distinct views that the storage unit of VWM consists of objects versus Boolean maps (cf. Huang, 2010c; Huang & Pashler, 2007). We recorded ERPs in three experiments to tap into the maintenance phase of VWM directly.
Since the seminal work of Luck and Vogel (1997), the hypothesis that visual objects comprise the storage unit of VWM has been advanced and supported fairly strongly. For instance, the performance of memorizing a few objects built from different dimensions of multiple features (e.g., a bar with varying color and orientation) is as good as that of memorizing the same number of single-featured objects (e.g., varying in color only; see Fukuda et al., 2010, for a review; Luria & Vogel, 2011; Lee & Chun, 2001; Vogel, Woodman, & Luck, 2001). Critically, the accuracy of remembering bicolored objects (i.e., those having two features from the same dimension) did not drop relative to the memory of single-colored objects; this runs against the predictions of the feature-based storage view (i.e., the view that features are the storage units of information in VWM; Luck & Vogel, 1997). Therefore, object-based storage in VWM seems to be a rather reasonable hypothesis. Indeed, most current VWM studies either implicitly or explicitly hold this hypothesis (e.g., Woodman, Vogel, & Luck, 2012; Gao, Gao, Li, Sun, & Shen, 2011; Zhang & Luck, 2011; Xu & Chun, 2009; Awh, Barton, & Vogel, 2007). This hypothesis generally assumes that a unit in VWM corresponds to an object at one location (strict object hypothesis), and both Gestalt grouping of proximity and connectedness are essential factors in defining an object in VWM (Xu, 2006). Notably, the scope of defining an object in VWM has been further broadened by recent studies (Anderson, Vogel, & Awh, 2012; Brady, Konkle, & Alvarez, 2011). Researchers suggested that information stored in VWM is hierarchically structured, and Gestalt grouping, chunking, and so forth, play important roles (see Brady et al., 2011, for a review). An object is defined as the top level of a hierarchical representation of the scene (broad object hypothesis): Physically separated objects are treated as one object (here, we call it hierarchical object) and hold in one VWM unit if clear Gestalt grouping cues exist among them. For instance, two individual discs, each containing a rectangular gap, occupy one unit in VWM when a strong collinearity cue exists (see, e.g., Anderson et al., 2012).
However, some recent studies from various laboratories have failed to replicate the findings of the aforementioned experiments using bicolored objects and hence challenged the object-based storage hypotheses. Those studies have revealed a memorizing cost for bicolored objects relative to the same number of single-colored objects (e.g., Parra, Sala, et al., 2011; Olson & Jiang, 2002; Wheeler & Treisman, 2002). Parra and colleagues further suggested that the color–color conjunction was stored as individual features whereas the relationship between the features was maintained (Parra, Cubelli, & Della Sala, 2011). In line with the above findings, a recent experiment demonstrated that the ERP amplitudes evoked by bicolored objects were significantly higher than those evoked by single-featured objects when participants were asked to remember the same number of objects (i.e., two objects), particularly during the 450–600 msec after the memory onset. However, those ERP amplitudes were considerably smaller than those evoked by asking participants to remember twice as many single-featured objects (i.e., four objects; see Experiment 2 in Luria & Vogel, 2011). Regardless of the results for bicolored objects, researchers currently largely accept the object-based account of the storage unit of VWM, particularly considering that it fits a “slots + averaging” model of VWM capacity (Zhang & Luck, 2008) and its ease of explanation.
The endeavor to seek alternative answers to the building blocks of VWM continues; a compromise between the findings regarding multiple-featured and bicolored objects is sought. One alternative that seemingly holds promise is the idea of Boolean maps as the fundamental building blocks of VWM. The Boolean map theory is a newly proposed yet very promising theory of visual attention by Huang and colleagues (Huang & Pashler, 2007, 2012; Huang, 2010a, 2010b, 2010c; Huang, Treisman, & Pashler, 2007). Different from the traditional view that the visual object is the unit of visual attention, this theory proposes that there are two different processes in visual attention: Attention first selects information in an object-based manner and then accesses the information via a Boolean map. Here, access shares the same meaning as the capacity limits of processing, referring to the ability of selected visual information to reach the level of consciousness in human beings. A Boolean map is a data structure of the visual field—it employs binary code to mark the visual field as selected and un-selected subregions. Thus, a Boolean map actually contains a set of locations. There are two basic ways to create a Boolean map. First, each Boolean map can be associated with a featural label, but such labeling is restricted to only one feature value per dimension for a map. For instance, if three distinct colors were presented, then three distinct Boolean maps would be created accordingly; however, three identical colors would lead to only one Boolean map. Second, a Boolean map can simultaneously contain multiple featural labels from multiple distinct feature dimensions sharing one spatial location. For example, a single visual object containing three distinct features (e.g., color, shape, size) can be represented in three Boolean maps but can also be represented in one Boolean map.
The assumption that the Boolean map serves as the building block of VWM is acceptable both theoretically and empirically. From the theoretical perspective, VWM is assumed to be actively involved in our on-line activities and in conscious information processing (Baddeley, 2012; Cowan, 2001), in line with the core assumption of the Boolean map (Huang et al., 2007). Moreover, recent studies reveal a very close correspondence between visual perception and VWM (e.g., Gao, Gao, et al., 2011; Harrison & Tong, 2009; Serences, Ester, Vogel, & Awh, 2009; Mayer et al., 2007). For instance, the cerebral regions in charge of perceptual processing are also involved in VWM processing of the same visual information, and accumulating evidence suggests that visual perception and VWM share similar attentional mechanisms. From the empirical perspective, many critical VWM findings can be well explained by the Boolean map theory when simple (note not complex) visual features are adopted as stimuli. Let us briefly review a few important VWM findings and see how the Boolean map theory works. First, as VWM can hold three to four simple visual objects (Luck & Vogel, 1997), Boolean map theory would claim that about three to four Boolean maps can be maintained. Second, bicolored stimuli cannot be stored as integrated objects in VWM (e.g., Parra, Sala, et al., 2011; Wheeler & Treisman, 2002), particularly in the initial phase of VWM (see Experiment 2 in Luria & Vogel, 2011). The Boolean map theory offers a very direct explanation for this pattern of findings: Each Boolean map has only one featural label from a feature dimension. Third, visual objects containing multiple distinct features from different dimensions are stored in an object-based manner (e.g., Gao, Gao, et al., 2011; Gao, Li, Yin, & Shen, 2010; Luck & Vogel, 1997). In contrast, two physically connected parts with distinct features could not be stored as an integrated object in VWM (e.g., the mushroom-like stimuli in Xu, 2002). These findings can be well explained by the second basic method of Boolean map creation. Finally, recent studies have implied that the focus of attention moved in a serial fashion in VWM, scanning one object at a time (Zelinsky, Loschky, & Dickinson, 2011; Awh, Vogel, & Oh, 2006). Similarly, the Boolean map theory claims that, at each time point, attention can only access one Boolean map. Therefore, although Boolean map theory does not explicitly state whether the theory is perception-based or memory-based, Huang and colleagues suggested that the principles of Boolean map theory may fit VWM and that Boolean maps may be the building blocks of VWM (Huang, 2010c; Huang & Pashler, 2007). Indeed, certain critical experiments supporting the Boolean map theory have adopted tasks (e.g., Experiments 3 and 4 in Huang & Pashler, 2007) that were quite analogous to the ones used to measure the consolidation of VWM (cf. Vogel, Woodman, & Luck, 2006).
However, no previous study has tested this promising and interesting alternative. If the Boolean map hypothesis of VWM is true, then there would be a substantial revolution in the understanding of VWM. To this end, the current study aimed to examine this hypothesis. To focus on the maintenance phase of VWM, we adopted contralateral delay activity (CDA), the amplitude of which provides on-line tracking of the amount of stored visual information in VWM, as the neural marker of interest (e.g., Diamantopoulou, Poom, Klaver, & Talsma, 2011; Ikkai, McCollough, & Vogel, 2010; Jolicoeur, Brisson, & Robitaille, 2008; Vogel & Machizawa, 2004). CDA has a direct link with the storage unit of VWM, and employing this index avoids any potential contamination from other processing phases (e.g., retrieval and comparison; Awh et al., 2007). The amplitude of CDA rises linearly with the number of stored units of information in VWM and levels off when VWM capacity is reached. Previous CDA studies using distinct simple features as the tested stimuli have demonstrated almost identical patterns of results to those predicted by Boolean map theory. For instance, Vogel and colleagues found that there was no difference in CDA amplitude between remembering the same number of colorless bars (one feature dimension) and colored bars (two feature dimensions; Luria & Vogel, 2011; Woodman & Vogel, 2008).
In the current study, we tested an important prediction of the Boolean map theory: identical feature values in the visual field correspond with only one Boolean map. This prediction received tentative support from one of our recent CDA studies (Gao, Xu, et al., 2011). In that research, we found that the CDA amplitude evoked by remembering four identical colors was similar to that evoked by remembering one color; both of those amplitudes were significantly lower than that elicited by remembering four distinct colors. However, the broad object hypothesis of VWM could also explain this finding, because there is a strong similarity grouping cue. This cue generates a hierarchical object in VWM, which only occupies one VWM unit. One may argue that the Boolean map theory had already suggested that similarity grouping may operate by selecting physically separated yet similar objects to include in a Boolean map (cf. Huang & Pashler, 2007); therefore, the hierarchical object explanation in terms of the broad object hypothesis is not an independent alternative. Although this argument is reasonable regarding Gao, Xu, et al. (2011), the hierarchical object explanation could not be replaced by the Boolean map explanation. These two explanations have distinct predictions when only weak similarity grouping cues exist. For instance, when one red color and two identical green colors are randomly displayed within a visual field and required for memorization, the Boolean map explanation predicts that only two Boolean maps will be generated (one for red color and one for green colors). However, the hierarchal object explanation predicts that a hierarchal object will be not always generated for the two green colors because the similarity cue is weak.1 Indeed, quite a few previous VWM studies allowed for the repetition of one feature value (e.g., two identical colors) in the experiments (e.g., Woodman et al., 2012; Vogel et al., 2001; Jiang, Olson, & Chun, 2000) and assumed that the participants processed and stored objects in a strict object-based manner while the similarity grouping factor was effectively controlled.
To this end, in the current study, we examined the building blocks of VWM by focusing on a condition in which, as mentioned above, weak similarity cues exist. Particularly, we displayed two (2-object) or three objects to the participants, and the constitution of the three objects was manipulated. Three distinct objects were displayed in 50% of the three-object trials (3-object), whereas in the other 50% three-object trials, three objects sharing two feature values (mixed-3-object) were presented. If the content in VWM is stored in a Boolean map manner, then there will be no difference between the mixed-3-object and 2-object conditions in terms of CDA amplitude, and both conditions will generate lower CDA amplitudes than that of the 3-object condition. In contrast, if individual objects are stored in a strict object manner, then there will be no difference between the mixed-3-object and 3-object conditions in terms of CDA amplitude, and both conditions will generate significantly higher CDA amplitudes than that in the 2-object condition. Finally, if physically separated objects can be stored in a hierarchal object manner because of similarity grouping, then the CDA amplitude of the mixed-3-object condition will be slightly lower than that of the 3-object condition but higher than that of the 2-object condition.
Experiment 1 was designed to test the three hypotheses by asking the participants to remember orientations.
Twelve students (four women; 18–26 years old) from Zhejiang University volunteered to participate in the experiment and received ¥40 as payment. None had a history of neurological problems, and all had normal or corrected-to-normal vision. The participants provided written and informed consent before the experiment was conducted. The procedures were in compliance with the Code of Ethics of the World Medical Association (Declaration of Helsinki) and approved by the Research Ethics Board of Zhejiang University and granting agency.
Each memory item was a 0.75° × 0.35° bar presented in black color (red, green, blue [RGB]: 0, 0, 0, ). The stimuli were presented on a gray background (128, 128, 128) on a CRT monitor attached to a 17-in. computer (100 Hz refresh rate). The orientation of each bar was selected from four orientations: 0°, 45°, 90°, and 135°.
Design and Procedure
Participants were seated in an electrically shielded and sound-attenuated recording chamber at a distance of 70 cm from the CRT monitor. The stimuli were presented within two rectangular areas subtending 3.6° × 6° of visual angle, centered 2.8° to either the left or the right of a central fixation cross. The memory array consisted of two different bars (2-object), three different bars (3-object), or three bars with two of them sharing one orientation (mixed-3-object). In this case, all of the memory loads were within the VWM capacity of normal young adults; thus, the participants could fulfill the task fairly easily without employing any particular strategy. The positions of the memory items were randomly selected in each trial with the constraint that the center-to-center distance between items be at least 1.2°. Participants were required to keep their eyes centrally fixated during the whole experiment.
The procedure of Experiment 1 is shown in Figure 1A. Each trial began with a 200-msec arrow cue pointing either to the left or right presented above a fixation cross. After a variable delay ranging 300–400 msec, a memory array was displayed for 100 msec. A blank screen was then presented for 900 msec, followed by a 2000-msec presentation of a test item. The orientation of the test item in the cued hemifield was different from that of a memory stimulus in the same location in 50% of trials and was identical in the remaining 50% of trials. To prevent participants from making decisions on the basis of the configuration information of the memorized objects (Jiang et al., 2000), we employed a partial probe task by merely displaying one bar in each rectangular area. The participant was required to indicate whether the test item was identical to the memory stimulus in the same location, stressing accuracy rather than response speed. For the test items displaying changed orientations, new orientations not used in the memory array were randomly selected for display.
Each participant performed 160 trials per set size, resulting in a total of 480 trials presented in a randomized order. The experiment was divided into six blocks with 5-min breaks between them. Before the formal experiment, at least 16 practice trials were completed to ensure that the participants understood the instructions.
Electrophysiological Recording and Analyses
EEG recordings were made at 62 scalp sites by using Ag/AgCl electrodes mounted on an elastic cap. All recordings were made using a left mastoid reference, and then the data were re-referenced offline to the algebraic average of the left and right mastoid voltages. Vertical EOGs and horizontal EOGs were recorded using two pairs of electrodes. One pair was placed above and below the left eye, and the other pair was placed at the outer canthus of the two eyes. All interelectrode impedances were maintained below 5 kΩ. The EEG and EOG signals were amplified by a SynAmps2 amplifier (Compumedics NeuroScan, USA) using a 0.05- to 100-Hz band-pass filter and were continuously sampled at 500 Hz/channel for offline analysis.
The data were corrected for eye blinks by using a regression procedure, and trials contaminated by horizontal eye movements greater than 2° (horizontal EOG amplitude > 32 μV) were excluded from analysis. The data were then segmented into epochs ranging from 200 msec before to 1000 msec after the onset of the memory stimulus for all conditions. Trials with remaining artifacts exceeding ±75 μV in amplitude after the regression procedure were rejected. The contralateral waveforms were computed by finding the average of the activity levels recorded at left- and right-hemisphere sites when participants were cued to remember the opposite side of the memory stimulus. CDA was calculated by subtracting the ipsilateral activity from the contralateral activity. The averaged CDA waveforms were smoothed by applying a 10-Hz low-pass filter (24 dB/oct). The CDA difference between the 2-object and 3-object conditions began to appear at 400 msec, and this difference lasted about 400 msec. Therefore, a time window of 400–800 msec after the onset of the memory stimulus was adopted to measure the mean amplitude of CDA. Electrode sites in the parietal (P1/P2, P3/P4, P5/P6, and P7/P8) and occipitoparietal (PO3/PO4, PO5/PO6, and PO7/PO8) regions (Figure 1B) were selected for further analysis.2 Because the result patterns were similar between the two regions, we pooled the data for the electrodes in the two brain regions, forming a representative site.
One-way repeated-measure ANOVAs with Load (2-object, 3-object, and mixed-3-object) as the within-subject factor were conducted on Accuracy, RT, and the Mean Amplitude of CDA during the time window of interest. For factors having more than two levels, the Greenhouse–Geisser correction (Epsilon) was used to adjust the degrees of freedom when necessary. Significant main effects (p < .05) were confirmed by post hoc Bonferroni-corrected contrast.
Figure 2A and B shows that performance is better in the 2-object condition than in the other two conditions; moreover, the performance in the mixed-3-object condition is also better than that in the 3-object condition. Confirming this observation, a one-way ANOVA revealed a significant main effect of Load on both Accuracy, F(2, 22) = 15.19, p < .001, ηp2 = .58, and RT, F(2, 22) = 43.51, p < .001, ηp2 = .80. Post hoc contrasts confirmed that the accuracy was significantly higher in the 2-object than in the 3-object condition (p < .05) or the mixed-3-object condition (p < .05); yet, no difference in accuracy existed between the latter two conditions (p > .05). However, RT was significantly longer in the 3-object than in the mixed-3-object condition (p < .002), and the RTs in both of those conditions were significantly longer than that in the 2-object condition (ps < .002).
Figure 2C shows that CDA amplitudes for both the 3-object and mixed-3-object conditions are higher than that for the 2-object condition, and CDA amplitude for the 3-object condition is also slightly higher than that for the mixed-3-object condition. The ANOVA on the Mean CDA Amplitude revealed a significant main effect of Load, F(2, 22) = 12.27, p < .001, ηp2 = .53. Post hoc contrasts revealed considerably higher CDA amplitudes in the 3-object (−1.37 μV; p = .001) and mixed-3-object (−1.24 μV; p = .05) conditions than in the 2-object condition (−0.89 μV). However, the CDA amplitude of 3-object condition did not significantly differ from that of the mixed-3-object condition (p > .2).
The behavioral and CDA results in Experiment 1 were generally not congruent with the predictions of the Boolean map hypothesis and the strict object hypothesis. We argue that the better performance and lower CDA in the mixed-3-object condition were related to the construction of hierarchical objects in VWM because of the similarity grouping. Therefore, the current results supported the broad object hypothesis of VWM storage.
Experiment 2 was designed to test whether the findings of Experiment 1 were only restricted to one specific feature dimension. Therefore, we asked the participants to remember simple colors instead of orientations while using the same type of stimuli as were used in Experiment 1.
A new group of 24 students (13 women; 18–26 years old) from Zhejiang University participated in the experiment.
The same stimuli as in Experiment 1—except in color—were used in Experiment 2. Seven simple colors were used for memorization: black (0, 0, 0), red (255, 0, 0), green (0, 255, 0), blue (0, 0, 255), yellow (255, 255, 0), purple (255, 0, 255), and white (255, 255, 255). The same orientation was used for all the memorized objects in each memory array presented during a trial, but their colors were manipulated. Similar to Experiment 1, there were three load levels: Participants were asked to memorize two distinct colors, three distinct colors, or three colored stimuli with two of them sharing one color.
Following previous studies using color memory tasks, a time window of interest from 400 to 1000 msec after the onset of the memory stimulus was adopted to measure the mean amplitude of CDA (e.g., Luria, Sessa, Gotler, Jolicoeur, & Dell'Acqua, 2010; Gao et al., 2009; Vogel, McCollough, & Machizawa, 2005).
The other aspects of Experiment 2 were identical to Experiment 1.
Figure 3A and B shows a similar pattern of results to Experiment 1. A one-way repeated-measure ANOVA with Load as a within-subject factor yielded significant main effects of Load on both Accuracy, F(2, 46) = 134.33, p < .001, ηp2 = .85, and RT, F(2, 46) = 76.03, p < .001, ηp2 = .77. Post hoc contrasts revealed that accuracy was significantly higher in the 2-object than in the mixed-3-object condition (p < .001), and the accuracy in both of those conditions was significantly higher than that in the 3-object condition (ps < .001). In addition, RT was significantly longer in the 3-object than in the mixed-3-object condition (p < .001), and RT in both of those conditions was significantly longer than in the 2-object condition (ps < .005).
Although the CDA difference between conditions with two and three distinct objects was more prominent in Experiment 2 than in Experiment 1, the pattern of results is analogous to that of Experiment 1 (see Figure 3C). In line with this observation, the one-way repeated ANOVA revealed a significant main effect of Load, F(2, 46) = 8.41, p = .001, ηp2 = .27. Post hoc contrasts found significantly higher CDA amplitudes in both the 3-object (−0.97 μV; p < .01) and mixed-3-object (−0.89 μV; p < .01) conditions than in the 2-object condition (−0.56 μV); no difference was revealed between the 3-object and mixed-3-object conditions (p > .9).
Using a different feature dimension, we replicated the findings of Experiment 1. Thus, both Experiments 1 and 2 consistently suggest that simple features are not stored in VWM using Boolean maps; instead, the feature value repetition caused certain hieratical objects created in VWM.
In Experiments 1 and 2, the participants were required to compare objects displayed in the same spatial locations between the memory array and the test array. In this way, participants had to process location information into VWM. This extra information may contaminate the findings of the experiment, particularly considering that location information and other nonspatial features are stored by different brain regions (e.g., Carlesimo, Perri, Turriziani, Tomaiuolo, & Caltagirone, 2001). Experiment 3 eliminated this factor by always presenting the test item at the center of the predefined visual field; in this way, the participants do not need to process the location information.
A new group of 21 students (12 women; 18–26 years old) from Zhejiang University participated. The test items were presented at the centers of two predefined visual fields. The participants were required to indicate whether the probe had been displayed in a previous memory array. The other aspects of the experiment were the same as Experiment 2.
Figure 4A and B shows that location indeed affects the task performance: similar accuracy and RT were observed between the 2-object and mixed-3-object conditions, and better performance was observed in both of those conditions than in the 3-object condition. A one-way repeated-measures ANOVA with Load as a within-subject factor revealed significant main effects on both accuracy, F(2, 40) = 66.85, p < .001, ηp2 = .77, and RT, F(2, 40) = 39.15, p < .001, ηp2 = .66. Post hoc contrasts found that there was no difference between the 2-object and mixed-3-object conditions in terms of either Accuracy or RT (ps > .05); however, higher accuracies and quicker RTs were observed in both of those conditions than those in the 3-object condition (ps < .001).
As shown in Figure 4C, the CDA amplitudes are quite close between the mixed-3-object and 3-object conditions albeit there is a slight drop in 400–700 msec in the mixed-3-object condition. A one-way ANOVA on CDA amplitude of 400–1000 msec yielded a significant main effect of Load, F(2, 40) = 12.80, p < .001, ηp2 = .39. Post hoc contrasts revealed that the CDA amplitude was significantly lower in the 2-object condition (–0.66 μV) than in the mixed-3-object (−1.03 μV; p = .001) and 3-object (−1.08 μV; p < .001) conditions; however, no difference was found between CDA amplitudes in the latter two conditions (p > .9).
In controlling for the possible contamination of the results by location information, we found that there was no difference between the 2-object and mixed-3-object conditions in terms of either accuracy or RT. These findings suggest that location information indeed plays a role in behavioral performance of Experiments 1 and 2. However, we did not reveal a reverse pattern in CDA. On the contrary, an even clearer pattern supporting object-based storage was demonstrated by the pattern of CDA results. Considering that CDA is an index of on-line tracking of the information maintained in VWM, whereas behavioral results reflect the final outputs after a series of processing steps, we argue that Experiment 3 provided consistent evidence to support the broad object hypothesis.
The current research investigated the building blocks of VWM by testing three hypotheses: strict object hypothesis, broad object hypothesis, and Boolean map hypothesis. Taking advantage of ERP component CDA, which allows us to focus on the maintenance phase of VWM while avoiding other potential contamination, in three experiments we examined a critical point about which the three hypotheses generate distinct predictions. The mixed-3-object condition was the critical one, as only two units of VWM will be occupied from the perspective of Boolean map hypothesis, in contrast to the prediction of three units (strict object hypothesis), or three units most of the time (broad object hypothesis) will be occupied. Instead of showing a pattern of non-difference between the 2-object and mixed-3-object conditions, we consistently found that the mixed-3-object condition elicited slightly lower CDA amplitudes relative to the 3-object condition but elicited significantly higher CDA amplitudes than the 2-object condition. The above results were not affected by the specific feature dimension being tested (Experiments 1 and 2) or the probing method used in the experiment (Experiment 3). In addition, the behavioral performance of the mixed-3-object condition was considerably better than that of the 3-object condition in the three experiments and was worse than that of the 2-object condition in Experiments 1 and 2. Taken together, the current study provide the first evidence suggesting that the seemingly promising Boolean map hypothesis cannot replace the object-based hypothesis regarding VWM for objects comprised by simple features. Instead, a broad object hypothesis for VWM storage is supported.
Although the Boolean map hypothesis could well explain many important aspects of previous VWM findings, the current study demonstrated consistent evidence against this hypothesis. Instead, the current study adds further evidence supporting object-based storage of simple objects (a broad object-based storage in particular). One may argue that the current findings at least partially support the Boolean map hypothesis: the behavioral results in the 2-object and mixed-3-object conditions were rather close, and in Experiment 3, they were virtually identical. However, as stated previously, the behavioral results represented final outputs after a series processes in the human brain (e.g., encoding, consolidation, maintenance, retrieval, and comparison); on the other hand, the current study was only interested in the nature of the building blocks of VWM, which function during the maintenance phase. Therefore, although behavioral performance sheds light on the information-processing aspects of VWM (see the final paragraph of the discussion), it cannot be used reliably to deduce the storage mechanism of VWM (e.g., Awh et al., 2007). On the other hand, one of the best advantages of CDA as a neural marker is that it sensitively reflects the information stored in VWM and is not affected by other non-storage factors (Gao, Yin, Xu, Shui, & Shen, 2011; Ikkai et al., 2010; McCollough, Machizawa, & Vogel, 2007). Indeed, the influence of object complexity on VWM capacity can be attributed to the adoption of CDA as a neural marker instead of the traditional behavioral performance (e.g., Diamantopoulou et al., 2011; Gao, Yin, et al., 2011; Luria et al., 2010; Gao et al., 2009). Therefore, the CDA results provide us with direct evidence on the current issue. The Boolean map hypothesis would predict that the parietal and occipitoparietal regions, which are in charge of VWM maintenance (Xu & Chun, 2006; Todd & Marois, 2004), should exhibit the following pattern of CDA amplitudes by condition: (mixed-3-object = 2-object) < 3-object. However, a reverse pattern across the three experiments was consistently observed.
Although we investigated the building blocks of VWM predominantly relied on the CDA results, the current behavioral and CDA results together support a broad object hypothesis of VWM storage. This hypothesis assumes that an object “is best defined by the perceptual groups that occupy top level of observers' hierarchical representation of the scene” (Anderson et al., 2012). Previous studies have suggested the Gestalt grouping is fairly automatic and is critical to VWM (Alvarez, 2011; Brady et al., 2011; Woodman, Vecera, & Luck, 2003). Because two of the three objects shared the same feature value in the mixed-3-object condition, therefore, Gestalt grouping could take place among them on the basis of feature similarity, which leads to creation of hierarchical objects. However, considering the objects were displayed randomly within the visual field, this similarity grouping will not always occur. Consequently, although this feature value repetition could facilitate participants' performance, it could not significantly reduce the maintenance load throughout the whole testing session. Moreover, this Gestalt grouping may occur in VWM instead of during the encoding phase, operating as a way to compress redundant information. This speculation receives support from our recent CDA study (Gao, Xu, et al., 2011) as well as Xu's fMRI study (Xu, 2009). These two studies used different neuroimaging methods yet reached the same conclusion: VWM selected four identical objects into VWM individually, and then only one identity representing the four identical colors was maintained in the superior intraparietal sulcus. We presently argue that the Gestalt grouping may be the mechanism of compressing the four congruous identities into one. Accordingly, the brain regions exhibiting the grouping were close to the superior intraparietal sulcus. In contrast to our speculation that the current Gestalt grouping took place after information selection of VWM, in one study Xu and Chun (2007) explicitly manipulated the grouping of the displayed visual objects in the visual field and revealed that Gestalt grouping occurred during the selection stage of VWM. However, different from the current study that the grouping is because of the similarity of targets, in Xu and Chun (2007) the grouping was task-irrelevant by presenting two target shapes on one black rectangle in the grouping condition while presenting them on different rectangles in the ungrouped condition. Thus, it seems that the way of inducing Gestalt grouping affects the manner of VWM to process the information, the exact characteristics of which require further elaboration.
The current CDA results verified the previous assumption that allowing one feature value repetition in the memory array maintained objects in VWM in an object-based manner. However, the current study also demonstrated that such repetition exerts an influence on performance, because hierarchical objects are created in VWM albeit the similarity grouping cue is weak. Therefore, from a strict control perspective, nonfeature repetition is preferred in future studies exploring the capacity of VWM.
Finally, although the current study did not support the use of Boolean map-based storage in VWM, the behavioral results of the current study provided tentative evidence that the Boolean map may be the unit for retrieval and comparison in VWM. Previous VWM studies on retrieval and comparison have suggested that detailed comparisons in VWM proceed according to a serial process, with one object addressed at a time (Yin et al., 2012; Houtkamp & Roelfsema, 2009; Hyun, Woodman, Vogel, Hollingworth, & Luck, 2009). This serial process is congenial to the cornerstone of the Boolean map theory: at each time point, only one Boolean map can be assessed (Huang & Pashler, 2007). Essentially the retrieval and comparison process of VWM is more close to the nature of “access” used in the Boolean map theory. In addition, the results of two recent studies challenging the object-based storage hypothesis could be well explained by the Boolean map theory in terms of retrieval/access. Using distinct manipulations, researchers from two laboratories consistently obtained findings that contradict the object-based storage hypothesis: For bars containing simple colors and orientations (i.e., the stimuli used in current Experiments 2 and 3), memory/report errors for color and orientation were independent from each other (Bays, Wu, & Husain, 2011; Fougnie & Alvarez, 2011), although the hypothesis would predict a correlation between them. In those two studies, the experimental tasks were to select a correct feature value from a feature wheel or to report the feature value at one specific location. Both of these tasks utilized information retrieved from VWM. According to Boolean map theory, only one Boolean map with one featural label can be accessed at each time point. Therefore, it is not surprising to reach independent conclusions for features measured on distinct dimensions, because they are represented on independent Boolean maps for purposes of information retrieval according to the task requirements. Related to the current study, we argue that both the hierarchical objects in the maintenance phase and the Boolean maps in the comparison phase contribute to the facilitation of feature value repetition observed in the behavioral performance of the current study. To conclude, although the current study rejected the hypothesis that Boolean maps comprise the building blocks of VWM storage, it possibly provided important evidence regarding the critical issue of the retrieval and comparison units in VWM, which has not been previously investigated.
This research was supported by the National Natural Science Foundation of China (31170974, 31170975, 31271089), Key Project of Humanities and Social Sciences, Ministry of Education (07JZD0029), the National Foundation for Fostering Talents of Basic Science (J0730753), the Social Sciences Foundation of Zhejiang Province (08CGWW006YBQ), the Research Fund of Department of Education of Zhejiang Province (Y201224811), and the Fundamental Research Funds for the Central Universities.
Reprint requests should be sent to Zaifeng Gao, Xixi Campus, Zhejiang University, Hang Zhou, P.R. China, 310028, or via e-mail: firstname.lastname@example.org.
Although how to define “weak” is unclear, in the current example, the similarity grouping cue is weak because the objects are displayed randomly.
The EEG data in O1/2 were very noisy, and the CDA amplitude at that location was very low; hence, the data from these sites were not used in further analysis.