Human memory benefits from information clustering, which can be accomplished by chunking. Chunking typically relies on expertise and strategy, and it is unknown whether perceptual clustering over time, through temporal integration, can also enhance working memory. The current study examined the attentional and working memory costs of temporal integration of successive target stimulus pairs embedded in rapid serial visual presentation. ERPs were measured as a function of behavioral reports: One target, two separate targets, or two targets reported as a single integrated target. N2pc amplitude, reflecting attentional processing, depended on the actual number of successive targets. The memory-related CDA and P3 components instead depended on the perceived number of targets irrespective of their actual succession. The report of two separate targets was associated with elevated amplitude, whereas integrated as well as actual single targets exhibited lower amplitude. Temporal integration thus provided an efficient means of processing sensory input, offloading working memory so that the features of two targets were consolidated and maintained at a cost similar to that of a single target.
Regardless of modality, virtually all information perceived by our senses is eventually clustered into larger structures before being stored in our memory systems. This so-called chunking process increases memory capacity, as more bits of information can be retained if clustering is possible (Miller, 1956). Chunking is thought to exist in two forms: The first is deliberate, often employed as a conscious mnemonic strategy, and the second is more automatic and continuously acts on perceptual representations (Gobet et al., 2001). Common to both is that chunking seems to take place at a relatively high level of abstraction, involving elaborate cognitive processes as is evident when memorizing a sequence of single, individual words by generating a sentence that contains these words. Even when more automatic, perceptual chunking occurs, links to existing knowledge or expertise seem needed, as reflected by the classic memory advantage of expert chess players over novices for game positions. Neuroimaging research on chunking has furthermore also implicated the involvement of the lateral frontal cortex, a region thought to be involved in the high-level organization and control of working memory (Bor, Duncan, Wiseman, & Owen, 2003).
Under the right circumstances, however, effortless clustering of perceptual information in space is also possible. Because the visual cortex is organized along spatial dimensions (e.g., retinotopically in areas V1–V3; Tootell et al., 1995), space is special in that neighboring regions in the visual field are represented by neighboring neurons. Because proximal neurons are more likely to be closely connected than distal ones, neural activity in response to a coherent shape or figure will also show similar coherency, facilitating object perception. An object is indeed frequently perceived and later maintained in memory as a discrete spatial entity, that is, as a single coherent chunk (e.g., Luria, Balaban, Awh, & Vogel, 2016), even though it may carry different features (color, shape). These features are quickly bound together at an early perceptual stage, often by virtue of their common location (Wheeler & Treisman, 2002). Larger patterns in a visual scene can also often be recognized rapidly, particularly if they form a wholesome global figure: a Gestalt (for a review, see Wagemans et al., 2012). Our ability to quickly extract and later recall the gist of a visual scene, even within a single feed-forward sweep through the visual cortex (VanRullen & Koch, 2003; Thorpe, Fize, & Marlot, 1996), probably relies similarly on such visuospatial clustering.
Here we test the hypothesis that perceptual clustering across time can similarly afford chunking. Unlike the clear spatial organization of the visual cortex, its temporal organization is more implicit and emerges from variable neural activity patterns that might be generated endogenously as well as induced exogenously, such as oscillations (Buzsáki & Draguhn, 2004) and neural response latency (Lamme & Roelfsema, 2000). Forming a temporal representation, a so-called event, is thus intrinsically different from representing an object defined in space. Functionally, such a temporal event can be defined as a relatively brief time interval, within which information is integrated, but which is segregated from other events (Dixon & Di Lollo, 1994). The phenomenon of temporal integration has been observed in various tasks, but one particularly illustrative example comes from the rapid serial visual presentation (RSVP) paradigm. In the RSVP task developed by Akyürek et al. (2012), visually compatible but discrete individual stimuli were successively shown at the same location, for about 100 msec each. These stimuli could be perceived individually, but also as an integrated whole (similar to a “0” and a “–” image in a traditional alarm clock display, or the integrated “8”). Observers frequently reported the integrated percept comprising two or more individually presented stimuli (Akyürek & Wolff, 2016; Akyürek et al., 2012).
Temporal integration has been associated exclusively with basic, early perceptual functions, rather than cognitive ones, such as memory. For instance, early studies focused on the perceived simultaneity of stimuli, whether or not they were part of the same perceptual moment (Allport, 1968; Efron, 1967). Later work targeted the lingering impression made by stimuli, referred to as visible persistence, hypothesizing that this might provide a bridge to succeeding stimuli (Hogben & Di Lollo, 1974; Eriksen & Collins, 1967). A well-known conceptualization of such persistence is iconic memory (Neisser, 1967). This kind of fleeting memory is thought to be limitless, so that any change in temporal integration rate should not impact upon later, more cognitive processing and storage. Recent studies, however, give indications that this is unlikely.
One important clue comes from the observation that the integration of sensory input over time seems to depend on one's current perceptual pace, that is, the interval across which integration occurs scales in response to perceived task demands (Akyürek, Toffanin, & Hommel, 2008; Akyürek, Riddell, Toffanin, & Hommel, 2007). This adaptive nature of temporal integration suggests that something can be gained from slowing down one's perceptual pace. At first glance, integration seems primarily lossy, because there is no evidence that the individual stimuli are still available after temporal integration has taken place: In the alarm clock example, participants indicate to have seen just the “8” and have no recollection of the individually presented stimuli. This shows that integration introduces irreversible temporal compression by reducing how well the resulting percept reflects the actual, discrete stimuli and the temporal characteristics of the sensory input. In addition, a temporally integrated representation can only become available for further processing after the last to-be-integrated stimulus has been presented, violating the maxim that speed of processing is vital to perception. However, these costs may be tolerable, as the informational value of perceiving discrete images during such short intervals may not always increase beyond what can be gained by a combined image. The optimal interval may thus rather be ecologically constrained, so that we need to constantly appraise the possible efficiency gained from having fewer events (chunks) due to temporal integration against the higher information fidelity without integration.
In this study, we used electrophysiological recordings to directly test the hypothesis that temporal integration is associated with increased cognitive efficiency, in particular during attentional selection, and during the encoding and maintenance of information in working memory. We measured the ERP during an RSVP task in which participants had to detect either a single or two sequentially presented targets among distractor stimuli, and we determined whether the ERPs support the hypotheses that temporal integration is more efficient. The RSVP task is known to result in a commensurate report rate of single, dual, and integrated dual targets while keeping stimulus conditions constant (Wolff, Scholz, Akyürek, & van Rijn, 2015; Akyürek et al., 2012), which enables a balanced measure of the ERP. Furthermore, because participants are always explicitly asked to identify the two targets individually, the observed frequency of integration responses is as unbiased as possible.
ERP component amplitude can be assessed to reflect on the degree to which functional processes related to temporal integration are being engaged. Here, we will specifically focus on the N2pc component that is associated with attentional processing of stimulus features and on the CDA (contralateral delay activity) and the P3 component that are associated with consolidation and maintenance in working memory.
The N2pc is a relatively early, lateralized component, which tracks the attentional processing of visually presented stimulus features across the hemifields (Kiss, van Velzen, & Eimer, 2008; Eimer, 1996; Luck & Hillyard, 1994). Lower N2pc amplitude should correspond to less attentional effort and/or higher efficiency. As the N2pc reflects relatively early attentional processes, we expect that the N2pc indexes the number of presented individual RSVP targets, irrespective of whether the later report on the presented stimuli indicates that the elements were integrated or not. Alternatively, if temporal integration would allow the joint attentional processing of both targets, N2pc amplitude should follow suit: lower amplitude for single and integrated targets than for dual targets that are perceived separately.
The second and third components to be examined are both related to working memory load. The CDA (Vogel & Machizawa, 2004), also known as the SPCN (sustained posterior contralateral negativity; e.g., Jolicœur, Sessa, Dell'Acqua, & Robitaille, 2006), is a negative deflection measured at parietal scalp sites that occurs during the retention period over the hemisphere contralateral to the memorized items. It may be noted that, despite its distribution observed at the scalp, the CDA may actually depend on a much broader, distributed neural network that includes parietal as well as frontal sites (Reinhart et al., 2012). The CDA amplitude increases with the number of objects (and/or their features) that are being held in working memory, independent of perceptual and attentional demands (Brady, Störmer, & Alvarez, 2016; Ikkai, McCollough, & Vogel, 2010). The third component is the P3, a sustained positive deflection measured at parietal midline locations. The P3 has been related to the process of memory consolidation and to making a response decision (Polich, 2007; Verleger, Jaśkowski, & Wascher, 2005; Kok, 2001). The P3 amplitude has also been shown to vary with increasing working memory load (e.g., Akyürek, Leszczyński, & Schubö, 2010).
As for the N2pc, for both the CDA and the P3 it is assumed that lower component amplitude should correspond to less effort, either because there was less to encode or maintain in memory or because these processes were more efficient in one case than another. Such efficiency might also be understood as the computational benefits associated with storing a certain amount of information in a single chunk in memory over storing the same amount of information in two separate chunks. Obviously, handling more items (chunks) carries cognitive costs. This computational account has been applied to similar tasks, successfully explaining a number of attentional and effort-based phenomena (e.g., Wierda, Van Rijn, Taatgen, & Martens, 2010; Taatgen, Juvina, Schipper, Borst, & Martens, 2009). Interestingly, in these models, early transfers of perceptual information to memory imply that less will accumulate in a single chunk, whereas late transfers imply the opposite.
For the purposes of this study, comparing amplitude on trials associated with integration and segregation is thus the critical test for determining whether the clustering of incoming information into an integrated percept is actually more efficient than segregated representations of its parts: Based on our hypothesis that temporal integration results in higher efficiency, we predict lower and similar amplitudes for the one-target and integrated two-target trials, whereas two-target trials that are separately represented should elicit higher amplitude.
The empirical results indeed showed that the N2pc tracks the number of presented targets, irrespective of the number of reported targets, but more importantly, the CDA and P3 results provided evidence for the principal hypothesis, with equal amplitudes for one-target and integrated trials and higher amplitude for segregated trials, demonstrating that chunking by temporal integration provides benefits in terms of neural efficiency due to more efficient memory encoding.
Thirty-nine undergraduate psychology students (21 men and 18 women) were recruited from the University of Groningen, who participated for course credit or voluntarily. All participants reported normal or corrected-to-normal visual acuity. The participants signed informed consent sheets before participation and were unfamiliar with the experimental paradigm and its purpose. The study was conducted in accordance with the Declaration of Helsinki and was approved beforehand by the ethical committee of the Department of Psychology with registration number 13043-NE(b). Before the analyses, a criterion of at least 10 artifact-free EEG segments per cell of the experimental design was chosen, leading to the exclusion of seven participants (five women and two men). Mean age of the final sample was 20.9 years (range 18–24 years).
Apparatus and Stimuli
Participants received written instructions and were seated in a noise-dampened, dimly lit testing chamber at a viewing distance of ±50 cm from the 17-in. CRT computer screen. Screen resolution was set to 800 × 600 pixels at 16-bit color depth and a refresh rate of 100 Hz. The experiment was programmed in E-Prime 2.0 Professional (Psychology Software Tools, Inc., Pittsburgh, PA) and run on a standard computer running Windows XP (Microsoft, Inc., Redmond, WA). Participants were informed that they were monitored with a camera mounted in the testing chamber and that a two-way intercom system was available for communication.
As shown in Figure 1, all stimuli were drawn in black, bold Courier New Font on a white background, with the exception of the blue target figures. Targets consisted of combinations of one to four corner lines, each 5 pixels thick and extending 14 pixels in both directions. With 6 pixels distance between corners, the full image (i.e., with four corners present) covered an area of 34 × 34 pixels. Targets were drawn randomly, without replacement, from the set of all possible targets, so that each type appeared with equal frequency. However, to be able to distinguish integrated percepts from individual targets, the constraint was added that within one trial a corner was never repeated between targets. The distractor letters were chosen randomly from the full alphabet without replacement on each trial and appeared in 36 point size covering the same visual area as the targets. The central fixation dot was an apostrophe (′) in the same font size. The distance between the two RSVP streams was 64 pixels (8% screen width). Less than and greater than signs (< or >) in 14 point size cued the stream in which the targets appeared.
The experiment consisted of 520 self-initiated trials, with a self-timed break after 260 trials. It took participants ±50 min to complete the experiment. The first block of 40 trials was considered practice and was discarded from analysis. Two-target trials (70%), one-target trials (20%), and catch trials (10%) were randomly intermixed. Catch trials encouraged participants to maintain central fixation; on these trials, the fixation dot underwent a subtle change into a quotation mark ("), to which the participants should immediately respond by pressing the space bar. Correct detection was rewarded by the appearance of a green screen and the message “Well spotted! :).” Misses were followed by a red screen and the message “You missed the change! :(.”
Trials started with a blank screen for 500 msec, followed by the cue for 600 msec, another blank screen for 400 msec, and the fixation dot for 200 msec. Thereafter, the synchronized RSVP streams on both sides of the fixation dot appeared. Each stream contained 16 stimuli, shown for 70 msec each, with an ISI of 10 msec. On target-present trials, the first target appeared randomly at the fifth or seventh position, with equal frequency. If there was a second target, it followed directly without intermittent distractors (i.e., at Lag 1). On catch trials, the fixation point changed randomly at the sixth or eighth position, again with equal frequency. After RSVP offset, a blank screen was presented for 200 msec followed by two successive response screens, which self-terminated after 1600 msec each. Participants entered the perceived target by pressing the numbers 1, 2, 4, and 5 on the numeric keypad, corresponding to the layout of its individual corners (1, bottom left; 2, bottom right; 4, top left; 5, top right), equalizing response demands. Participants concluded their input by pressing Enter, which they could also do without entering any corners in case they had not seen the corresponding target. The participants were nevertheless instructed to guess when they were merely uncertain. There was no feedback about performance on target trials.
The primary purpose of the experiment was to examine and compare the ERPs associated with the processing of a single target, as well as that of two targets, dependent on how these were perceived. Comparisons were thus made between three response categories, namely, correctly identified targets in one-target trials, two correctly identified targets in two-target trials (regardless of report order), and two correctly integrated targets in two-target trials. Following Akyürek et al. (2012), a correct report classification of any target required its exact identification within a single response prompt, and feature exchanges or migrations of any kind between targets were not included in the measures.
Electrophysiological Recording and Analysis
EEG was continuously measured with 64 Sn electrodes, arranged according to the extended International 10–20 system. Additional electrodes at the outer canthi of both eyes and below and above the left eye measured the horizontal and vertical EOG, respectively. A sternum electrode served as ground, and an average reference was used during recording. Electrode impedances were kept below 10 kΩ. The EEG was amplified by means of a REFA 8-72 amplifier with a 140-Hz cutoff filter and recorded at a 500-Hz sampling rate.
Data preprocessing was performed with Brain Vision Analyzer 2.1 (Brain Products GmbH, München, Germany). The data were re-referenced against the average of both mastoid electrodes, and Butterworth Zero Phase filters were applied with a 40-Hz low pass at −12 dB and a 0.1-Hz high pass at −6 dB. Segments of EEG were time-locked to the onset of the second target, ranging from −200 to 1000 msec poststimulus. Trials in which horizontal eye movements occurred (voltage steps greater than 50 μV or more than 80 μV difference across the segment) were discarded. Vertical eye movements and blinks were corrected with the Gratton–Coles procedure (Gratton, Coles, & Donchin, 1983). For each electrode, segments with other artifacts (amplitudes in excess of ±80 μV or differences below 0.1 μV across 100 msec) were also discarded. The PO7 electrode of one participant malfunctioned, which was corrected by means of topographical spline interpolation. Finally, the 200 msec before stimulus onset was used for baseline correction.
As indicated, the average amplitude of three ERP components was examined. The first component was the N2pc, providing an index of attentional processing. The N2pc was measured as the difference between contra- and ipsilateral sites of the commonly used PO7/PO8 electrode pair (e.g., Akyürek, Leszczyński, & Schubö, 2010), relative to the visual hemifield of the target, from 90 to 170 msec poststimulus. The second component was the CDA, which was chosen as an index of working memory load. The CDA was measured as a difference wave between contra- and ipsilateral sites relative to the target at the P7/P8 electrode pair, which has previously been used to measure CDA amplitude (e.g., Jolicœur, Brisson, & Robitaille, 2008; McCollough, Machizawa, & Vogel, 2007), from 200–600 msec to 600–1000 msec poststimulus. Although other neighboring pairs, such as PO3/PO4 and P5/P6 also reflected the expected differences between the veridically different load conditions (i.e., one vs. two targets; unrelated to our hypotheses with regard to integration), the P7/P8 pair carries the advantage of being spatially displaced from the PO7/PO8 pair at which we measured the N2pc, thereby presumably providing a more independent sample. The choice for the two broad, equally sized analysis time windows was motivated by the observation that, in the veridically different conditions, the load sensitivity associated with the CDA seemed to develop relatively late. Any further hypothesized differences on this component should thus presumably follow suit. Bonferroni correction was applied to guard against false positives resulting from testing both windows. The third component of interest was the P3, measured at the traditional Pz electrode, from 300 to 600 msec, which was taken to reflect working memory consolidation.
Linear mixed-effects modeling was used to test the hypotheses with regard to the aforementioned components elicited in each response category (i.e., single target correct, two targets correct, or two targets integrated). Because response coding was used to define the independent variable, linear mixed-effects modeling is the optimal statistical choice, as it is less affected by unbalanced designs than standard regression models and enables accounting for participant-specific effects (see Baayen, Davidson, & Bates, 2008). The analysis was carried out in R (R Core Team, 2013), with the lme4 package (Bates, Mächler, Bolker, & Walker, 2015).
The most general model included random intercepts for participants and by-participant random slopes for response category and visual field (i.e., targets shown in the left or right visual hemifield). Starting from this model, structural simplification was performed by likelihood ratio testing of the model with the effect in question against the model without it. This allowed for a more robust likelihood ratio test of the effect of interest (i.e., the fixed effect of response category), using the final and most parsimonious model. In addition to linear mixed-effects modeling, we computed the Bayes Factors (BF) of effects using the BayesFactor package in R (Morey, Rouder, & Jamil, 2015). In case of a significant effect of response category, differences across the three categories were further assessed using Tukey's honestly significant difference (HSD) test (glht from the multcomp package; Bretz, Hothorn, & Westfall, 2010) to correct for multiple comparisons (Tukey, 1994). Response category was entered as factor, with the effect of one target encapsulated by the intercept and the differential effects of two targets and integrated targets separately estimated. The assumptions underlying the linear mixed-effects models were tested by visual inspection of normality and residual plots, and all models appeared to satisfy the relevant underlying assumptions. Finally, the influence.ME package (Nieuwenhuis, Te Grotenhuis, & Pelzer, 2012) was used to test for outliers at the participant level. Although outlying participants influenced the reported results beyond the suggested cutoff value of 4/n for Cook's distance, applying sensitivity analyses by excluding the relevant participants revealed no qualitative difference in the reported results.
RESULTS AND DISCUSSION
The behavioral data showed that in single-target trials, correct identifications averaged 55.5% (SEM = 3.3%, range 20–88%). In the more frequently presented two-target trials, correct identifications of both targets were achieved in 24.6% (SEM = 2.3%, range 4–59%) of trials, regardless of report order. Correct integrations occurred on 18.5% (SEM = 2.2%, range 3–53%) of trials. Trials in which only one of the two targets was reported correctly averaged 24.7% (SEM = 1.5%, range 10–46%). Both accuracy and absolute integration rates were comparable to a previous study that used a similar task in a single-stream variant (Akyürek et al., 2012). Finally, 98% (SEM = 0.6%, range 85–100%) of the catch trials were correctly responded to.
The N2pc difference wave is displayed in Figure 2. Likelihood ratio testing of the effect of response category showed that the response category (βtwo targets = −0.78, SE = 0.21, t = −3.68, βintegrated targets = −0.88, SE = 0.22, t = −3.93, χ2(2) = 19.13, p < .001) significantly added to the most parsimonious model that included fixed effects for conditions and visual field as well as by-participant random intercepts and by-participant random slopes for visual field (see also Table 1). Tukey's HSD tests further revealed that the single target and two targets conditions were different (z = −3.67, p < .001, BF < 0.01) and so were the integrated targets and single target conditions (z = −3.93, p < .001, BF = 0.04). There appeared to be no difference between two integrated targets and two separately reported targets (β = −0.10, SE = 0.21, z = −0.47, p = .884, BF = 16.03). These results suggest that the attentional processing in this early phase depended more on the number of targets shown, rather than on the (eventual) perceptual outcome.
|Fixed Effects, β (SE) .||Outcome .|
|N2pca .||CDA (Early)a .||CDA (Late)a .||P3b .|
|Intercept||−2.51 (0.47)||−1.25 (0.24)||−1.89 (0.52)||5.47 (0.62)|
|Two targets||−0.78 (0.21)*||−0.44 (0.25)||−1.01 (0.32)*||0.80 (0.29)*|
|Integrated targets||−0.88 (0.22)*||−0.28 (0.26)||−0.16 (0.34)*||0.08 (0.31)*|
|Visual field||−1.79 (0.70)||–||2.49 (0.99)||–|
|Fixed Effects, β (SE) .||Outcome .|
|N2pca .||CDA (Early)a .||CDA (Late)a .||P3b .|
|Intercept||−2.51 (0.47)||−1.25 (0.24)||−1.89 (0.52)||5.47 (0.62)|
|Two targets||−0.78 (0.21)*||−0.44 (0.25)||−1.01 (0.32)*||0.80 (0.29)*|
|Integrated targets||−0.88 (0.22)*||−0.28 (0.26)||−0.16 (0.34)*||0.08 (0.31)*|
|Visual field||−1.79 (0.70)||–||2.49 (0.99)||–|
The most parsimonious model had by-participant random intercepts and by-participant random slopes for the effect of visual field.
The most parsimonious model had a by-participant random intercept only.
p < .05 as determined by likelihood ratio testing of this model against the model without response condition effects.
The difference waves in Figure 3 show the CDA components elicited in each condition. In the early window (200–600 msec), testing whether the condition effect added anything to the most parsimonious mixed-effects model using a likelihood ratio test revealed no evidence that there were differences between any of the response categories in the early CDA window (βtwo targets = −0.44, SE = 0.25, t = −1.75, BF = 3.40, βintegrated targets = −0.28, SE = 0.26, t = −1.06, BF = 9.11, χ2(2) = 3.09, p = .42 [0.21 uncorrected]; see Table 1). Thus, it appears as if the CDA effect did not start during this early time window directly following the N2pc.
The same likelihood ratio test applied to the most parsimonious model in the late window (600–1000 msec) revealed a significant effect of the response categories, even after Bonferroni correction (βtwo targets = −1.01, SE = 0.32, t = −3.13, βintegrated targets = −0.16, SE = 0.34, t = −0.46, χ2(2) = 12.27, p = .004; see Table 1). Examination of the differences underlying this effect revealed three findings: (1) single targets differed from two targets (z = −3.13, p = .005, BF = 0.23), (2) integrated targets differed from two targets (β = 0.85, SE = 0.31, z = 2.76, p = .016, BF = 0.79), and (3) integrated targets did not differ from single targets (z = −0.46, p = .891, BF = 12.82). Note that the Bayes factors suggest that the effect was mainly driven by the similarity between integrated and single targets. Taken together, these results thus suggested that the CDA in the late window reflects the veridical load effect between single and dual targets, but most importantly, that the temporal integration of two targets reduced memory load to the extent that it could no longer be distinguished from the maintenance of a single target.
The P3 is shown in Figure 4 for the different response conditions. Likelihood ratio testing revealed that response categories added significant information to the most parsimonious model (βtwo targets = 0.80, SE = 0.29, t = 2.75, βintegrated targets = 0.08, SE = 0.31, t = 0.25, χ2(2) = 9.38, p = .009; see Table 1). Tukey's HSD tests revealed that the significant differences arose from differences between single targets and two targets (z = 2.75, p = .017, BF = 0.02), as well as between integrated targets and two targets (β = −0.73, SE = 0.30, z = −2.44, p = .039, BF < 0.01), whereas there was no difference between integrated targets and single targets (z = 0.25, p = .966, BF = 7.51). These results were thus fully in line with those of the CDA, indicating that we failed to observe differences in P3 amplitude between single and integrated targets but did observe differences when comparing these categories with that of two separately reported targets.
The results were unequivocal: CDA and P3 amplitude showed that temporal integration of two distinct, successive visual targets allowed observers to consolidate and maintain them in working memory at a clearly reduced cost (as expressed in component amplitude), similar to that of a single target. The efficiency gained by integration was thus substantial, and it can be concluded that our cortically “prewired” tendency to make memory-efficient visuospatial clusters (Luria et al., 2016) is mirrored by a more implicit ability to form efficient temporal events. Apart from this primary result, evidence from the N2pc also suggested attentional processing was not noticeably altered by temporal integration, in that two targets always required more attentional processing than a single target, regardless of integration.
The Bayesian analyses proved that these effects were robust, even though the number of trials available for the lateral analyses in particular was relatively low, compared with other paradigms used in previous studies (e.g., McCollough et al., 2007). The results furthermore reliably replicated effects that were to be expected, such as the amplitude differences between single and dual target reports observed at each of the examined components. Additionally, the new findings with regard to the fate of integrated targets were strengthened by the convergence between the nonlateral analysis of the P3 and the CDA, both of which exhibited the same pattern of differences between the conditions.
The primary measure in this study was CDA component amplitude, which has been shown to respond quite directly to the load placed on working memory (Vogel & Machizawa, 2004). In this study, the measure was validated by the to-be-expected load effects associated with one- and two-target trials. The CDA component seemed relatively slow to reach its maximum amplitude, which has been taken as a reflection of when items in working memory attain a durable state. It has been shown that higher memory loads can delay the component up to 500 msec for 75% amplitude for four items or more (Perez & Vogel, 2012). This suggests that the current target symbols posed a relatively high load on working memory maintenance, possibly in part due to the processing demands of the ongoing RSVP—something which might affect the consolidation of more automated alphanumeric stimuli to a lesser extent (cf. Jolicœur et al., 2008). Most importantly, the CDA reflected that the load associated with the maintenance of an integrated representation of two targets could not be distinguished from that of a single target.
The analysis of P3 amplitude revealed a pattern of results that was quite similar to that of the CDA. Although theoretical accounts of the P3 vary in terminology, P3(b) amplitude can probably be safely understood as a reflection of memory consolidation effort (Polich, 2007; Verleger et al., 2005; Kok, 2001). In RSVP, it has previously been shown to reflect the comparative amount of information being consolidated (e.g., Akyürek, Leszczyński, & Schubö, 2010; see also Wierda, Van Rijn, Taatgen, & Martens, 2012). In the current study, the single-target and integrated targets conditions showed similar amplitude, whereas the two-target amplitude was clearly elevated. The results thus suggested that there is more to consolidate when targets are represented separately, reflective of costs associated with the processing of two events rather than one. The observed pattern of differences in P3 amplitude may be taken to mean that reduced load on working memory (i.e., the CDA effect) is preceded by reduced consolidation effort. The combined evidence from the CDA and P3 thus suggests that temporal integration offers an efficient way to store and maintain information in memory and thereby affords a means of chunking.
It is tempting to frame the current outcomes in terms of classic, quantitative accounts of memory (Cowan, 2001; Miller, 1956), in which working memory consists of a number of storage slots that govern its total capacity. The current results could therein be interpreted such that targets perceived separately consume two memory slots, and integrated targets only one. Yet, if only because direct evidence from recall precision is not available in the current task, the results do not rule out a more qualitative interpretation, in which working memory capacity is viewed as a more flexible resource (Ma, Husain, & Bays, 2014; Bays & Husain, 2008). Working memory capacity in resource-based models is not determined by a fixed number of slots but rather by a pool of cognitive resources that can be spread across the items to be retained. Thus, one can remember either a small number of items with high precision or a large number with low precision. Although resource-based models seem primarily feature-based, it remains conceivable that resources could be spent on either individual or integrated features (if they are compatible) and that the latter option might thereby enjoy chunking benefits and reduced allotment of resources, without necessarily sacrificing precision.
In the current paradigm, the reported number of features (i.e., individual corners) was strongly dependent on the number of (constituent) targets: In (correct) one-target trials, the average number of reported features was 1.91, clearly less than that in two-target trials, which averaged 2.86 features in a single response for integration trials and 2.82 features summed over two responses for separately reported targets. Thus, the only tangible difference between target pairs that were integrated and pairs that were not was the perceived number of discrete events. This suggests that making such event representations carries working memory costs that clearly exceed those associated with the storage of the features themselves: When a single event was perceived, neither P3 nor CDA showed much further difference between true single-target representations and integrated target pairs, even though the latter contained more features on average. At the same time, retaining a similar number of features across two separate targets was associated with elevated component amplitude. Without disqualifying the possible role of resource sharing, these results thus support the view that some form of discrete, structured (object) representation also influences working memory capacity, as proposed by some authors (Brady, Konkle, & Alvarez, 2011; Zhang & Luck, 2008, 2011; Rouder et al., 2008; Alvarez & Cavanagh, 2004; Olson & Jiang, 2002). This need not imply that such integrated object representations are always the default in working memory, although it has been shown that object-based properties do influence performance even when a task is primarily feature-based (Vergauwe & Cowan, 2015). In the current paradigm, neither type of representation was explicitly favored, but it will be an interesting topic for future research to examine to what extent the current pattern of results will hold when feature complexity is increased and precision becomes more important.
What neural mechanisms might underlie the effects of temporal integration on working memory? Although this study was not aimed at directly investigating the full scope of this question, consideration of such mechanisms may help to provide further context to the present findings. Because of its inherent periodicity, oscillatory neural activity, particularly at lower frequencies, may play an important role. Assuming that high-frequency synchronization binds different features within a single scene (i.e., in space) and that synchronization at lower frequencies may do the same across time, then cross-frequency coupling (CFC) in particular seems to be a fitting and elegant way to achieve temporal integration. Through CFC, a high-frequency rhythm can fall in line with a lower frequency, such that amplitude of the former may be boosted when it matches the phase of the latter (note that other modulations of amplitude and phase are also possible). Such coupling instantiates a natural link between rapid, more localized neural codes and slower, more distributed ones (Canolty & Knight, 2010). Such codes may be related to the near-instantaneous representation of visual features and more gradually emerging episodic representations, respectively. This also implies that, at the latter level, unrelated features at the local level may be integrated into an episodic event by virtue of a common, low-frequency rhythm.
CFC has been hypothesized to occur during perception (VanRullen & Koch, 2003), providing a certain rhythmicity to the process, but working memory has also been related to CFC, between the gamma band on the one side and alpha and theta bands on the other. Indeed, perceptual and working memory processes are likely intertwined, as evidenced by the finding that prestimulus alpha synchronization can predict working memory performance (Myers, Stokes, Walther, & Nobre, 2014). With regard to working memory specifically, it has been proposed that coupling with lower-frequency bands serves to maintain information (Roux & Uhlhaas, 2014) and to keep ordered items separate (Buzsáki & Moser, 2013; Lisman & Jensen, 2013; Luck & Vogel, 2013), whereas gamma oscillations contain specific featural information.
Interestingly, the power of oscillations in the alpha frequency range has been causally related to the CDA component in a working memory task by Van Dijk, van der Werf, Mazaheri, Medendorp, and Jensen (2010), suggesting a direct link between the currently observed CDA amplitude differences due to temporal integration and low-frequency oscillations. Thus, transferring and maintaining episodic (perceptual) information in synchronized low-frequency oscillations may be a common form of neural communication by which temporal integration committed at an early stage of processing directly impacts the later representation in memory, including any chunking it might imply.
The Role of Attention
Apart from the effects related to working memory, this study also considered possible attentional differences, as reflected in the N2pc. To recap, the main difference in N2pc amplitude was found between one-target trials and two-target trials, regardless of the report in the latter (i.e., integrated or not). In terms of attention, a straightforward interpretation is that processing of two targets requires more (sustained) effort than processing one. Alternatively, the difference may be related to the number of features reported in the two cases, which was lower in the one-target trials. To contrast the target- and feature-related interpretations, successive exploratory cuts of the data were taken in which the four-feature trials were first removed from the two-target trials, reducing the overall total to 2.58 features, and then also the three-feature trials, reducing the total to 2 features. In both cases, the results remained virtually indistinguishable from the full set, with no hint toward reductions in N2pc amplitude, as would have been predicted from a feature-based account. Thus, it seems the observed N2pc difference is most likely to reflect the number of targets instead. One implication is that at least in the present task spatial integration (i.e., within a target stimulus) precedes the N2pc whereas temporal integration does not.
Another, target-unrelated interpretation of the difference between N2pc amplitude in one- and two-target trials is also conceivable; namely that the distractor at Lag 1 in the one-target trials might have suppressed N2pc amplitude. In the model of the attentional blink proposed by Olivers and Meeter (2008), an inhibitory effect of a distractor following T1 is indeed predicted. There is however prior evidence that dispels this idea. Dell'Acqua and colleagues (Dell'Acqua, Doro, Dux, Losier, & Jolicœur, 2016; Dell'Acqua et al., 2015) showed that there was virtually no offset latency difference in the attention-related P3a component between conditions in which two targets followed each other directly and in which a target was followed by a distractor. Furthermore, they did not observe the hypothesized differences in the negative component post-P3a at frontal sites that should have accompanied distractor-based inhibition, and no difference in P3b latency between target–target and target–distractor trials. Taken together, these results suggest that the attentional blink results from processing the targets themselves, which is compatible with the model proposed by Wyble, Potter, Bowman, and Nieuwenstein (2011).
Either way, for the hypotheses of the current study, the comparison between the physically identical two-target trials was of principal interest. Contrary to previous reports of subtle differences in the N2pc evoked by stimuli perceived as an integrated whole and ones that were not (Akyürek & Meijerink, 2012), the component appeared to be similar in the present data. The nature of the task used to measure temporal integration is a likely reason for this discrepancy. In previous studies, a so-called missing element task was used (Akyürek, Schubö, & Hommel, 2010; Hogben & Di Lollo, 1974). In this task, a grid of 5 by 5 locations that each may or may not contain a small black square is presented twice, each time filled with 12 squares. Participants then attempt to find the one location in the grid that remained empty. Akyürek and Meijerink (2012) observed that the N2pc to the location of the missing element exhibited a more gradual build-up than the N2pc to a veridical singleton and that it took longer to dissipate. Two salient differences with the current task may play a role. First, focal attention is not as available in the missing element task, because the task is more spatially distributed, which is known to impair integration (Akyürek & van Asselt, 2015; Visser & Enns, 2001). Second, the number of relevant stimuli (25) far exceeds memory limits, which presumably prevents the participants from relying on anything other than earlier, more basic forms of temporal integration. In the current RSVP task, attention could be sustained at the single location of the relevant stream, and the target stimuli had a more manageable feature set. It is thus conceivable that temporal integration in the current task was mostly realized in a later processing phase, after the initial attentional target selections had already been made. In other words, the difference between tasks may provide different opportunities for visible and informational persistence to contribute (Loftus & Irwin, 1998; Coltheart, 1980; Di Lollo, 1980).
The similarity between integrated and separate targets in N2pc amplitude forms a striking contrast with the dissimilarity between these cases observed at the CDA and P3, which suggests that attention and memory function differently in the current task. This may be surprising, since selective attention and working memory seem to have a lot in common (Gazzaley & Nobre, 2012). It has even been suggested that working memory and attention are so closely related that the former may be considered as an instance of the latter, only directed toward internal representations, rather than external stimuli (Kiyonaga & Egner, 2013; Chun, 2011). The current results diverge from such a unitary approach by demonstrating a qualitative difference, namely that the nature of representation is changed from the attentional processing of the visual features (at the N2pc) to the consolidation and maintenance of that information (at the P3 and CDA). This difference may have been brought to light because the comparison between pairs of targets that were or were not integrated does not entail a difference in terms of selection; the relevant features were identical. Thus, by equalizing the selection aspect, which may constitute a large part of the observed overlap between attention and memory, the present data uniquely reflect on the representational aspect. It needs to be acknowledged, however, that the process of temporal integration may itself be held responsible for the observed change, which leaves the possibility that attention and memory are indeed functionally similar, but act on different representations, depending on what is made available to them by concurrent operations.
Costs and Benefits of Perceptual Pacing
Unlike spatial clustering, for which information loss is not a necessary outcome, when successive stimuli are integrated into a singular percept, precise information with regard to the timing of the individual stimuli may be lost. Previous studies on the relationship between reciprocal report order errors and temporal integration in RSVP lend credence to this idea (Akyürek et al., 2012). This information loss may be lamented, but the question should be posed in what kind of situation outside the lab such loss would matter. Even when things move rapidly in the visual field, such as when a ball flies toward a catcher, the perceptual analysis of the situation need not necessarily revolve around the briefest time intervals that one could possibly distinguish. Rather, it is important to gather a sense of motion, speed, and direction; properties that are arguably better analyzed when there is more time available for the stimulus to provide relevant information (e.g., motion blur indicating that the ball was displaced a little in a certain direction).
The point could thus be made that when it comes to perceptual analysis over time, shorter is not necessarily better. The most optimal approach may rather be to focus on time intervals that are ecologically suitable, fitting to a pace in which natural events unfold. Such intervals may well be estimated at 100–250 msec, which have been associated with temporal integration windows in varied visual and auditory tasks (e.g., Tervaniemi, Saarinen, Paavilainen, Danilova, & Näätänen, 1994; Hogben & Di Lollo, 1974; Eriksen & Collins, 1967). This clearly contrasts with estimates of maximal temporal sensitivity, such as given by the critical fusion frequency (∼8 msec; Hecht & Verrijp, 1933).
The current data point toward a further ecological advantage of slower, event-based perception, namely that it affords efficiency. This may be a considerable advantage in view of the relatively high metabolic costs associated with cortical activity in general (Lennie, 2003) and the sizeable brain area devoted to (visual) perception. Support for a relationship between working memory function and energetic efficiency, which was not directly measured here, comes from a recent pupillometric study on mental effort in RSVP. This study by Wolff et al. (2015) showed that temporal integration of two targets was associated with decreased effort, comparable to that of a single target. Although mental effort is not a direct measure of energetic costs either, it seems likely the two are correlated. Reduced memory storage costs are thus observed under the same conditions as reduced mental effort and, presumably, energetic costs.
The current study demonstrated how the temporal integration of successive stimuli that are presented across brief intervals affords a form of chunking, resulting in increased cognitive efficiency. Perceptual “choices” made in a relatively early processing phase can thus yield substantial savings when it comes to consolidating and maintaining the stimuli in memory. The quest for optimal efficiency may have thus driven the human perceptual system toward preferential processing of longer, more comprehensive events, rather than the briefest intervals that it can principally resolve.
The authors are grateful to Edward Awh for pointing us toward the CDA component during the genesis of the project and to Peter Albronda for technical support with the electrophysiological data acquisition.
Reprint requests should be sent to Elkan G. Akyürek, Department of Psychology, Experimental Psychology, University of Groningen, Grote Kruisstraat 2/1, 9712 TS Groningen, The Netherlands, or via e-mail: email@example.com.