Mental imagery (MI) is the ability to generate visual phenomena in the absence of sensory input. MI is often likened to visual working memory (VWM): the ability to maintain and manipulate visual representations. How MI is recruited during VWM is yet to be established. In a modified orientation change-discrimination task, we examined how behavioral (proportion correct) and neural (contralateral delay activity [CDA]) correlates of precision and capacity map onto subjective ratings of vividness and number of items in MI within a VWM task. During the maintenance period, 17 participants estimated the vividness of their MI or the number of items held in MI while they were instructed to focus on either precision or capacity of their representation and to retain stimuli at varying set sizes (1, 2, and 4). Vividness and number ratings varied over set sizes; however, subjective ratings and behavioral performance correlated only for vividness rating at set size 1. Although CDA responded to set size as was expected, CDA did not reflect subjective reports on high and low vividness and on nondivergent (reported the probed number of items in mind) or divergent (reported number of items diverged from probed) rating trials. Participants were more accurate in low set sizes compared with higher set sizes and in coarse (45°) orientation changes compared with fine (15°) orientation changes. We failed to find evidence for a relationship between the subjective sensory experience of precision and capacity of MI and the precision and capacity of VWM.

Our ability to generate perceptual phenomena in mind allows us to contemplate the future and remember the past, while navigating through the present. Mental imagery (MI) is defined as the ability to generate visual mental images in mind in the absence of sensory input (Kosslyn, 1980). MI is consistently likened to visual working memory (VWM; Tong, 2013), the ability to maintain and manipulate visual information in mind (Baddeley, 2003; Cowan, 2001; Baddeley & Andrade, 2000; Logie, 1995). However, the evidence does not yet warrant this conclusion. Previous research has suggested some people appear to recruit MI strategies in VWM tasks, whereas others do not (Bates & Farran, 2021; Keogh & Pearson, 2014). In the context of aphantasia, individuals report no sensory experience of MI while holding typical abilities in VWM (Pounder et al., 2022; Jacobs, Schwarzkopf, & Silvanto, 2018), which further suggests a distinction between these seemingly unified subprocesses. Empirical studies are currently limited to directly comparing the behavioral and neural substrates of MI to VWM. To explain the relations between MI and VWM, direct evidence is required to examine how MI is recruited within a VWM task.

Delineating the Relationship between MI and VWM

The investigation of how MI and VWM are related is limited, and the suggestion that they are similar functions is largely based on the parallels between the definitions of MI and VWM and the evidence for overlapping functional activation underpinning the two abilities (Spagna, Hajhajate, Liu, & Bartolomeo, 2021; Pearson, 2019; Lorenc, Lee, Chen, & D'Esposito, 2015; Sreenivasan, Curtis, & D'Esposito, 2014; Miller & D'Esposito, 2005). Much like in the MI neuroimaging literature (Spagna et al., 2021), there is evidence for a functional role of the frontal regions in VWM, namely, the lateral pFC (Sreenivasan et al., 2014; Miller & D'Esposito, 2005), but there is also evidence that the visual cortex plays an important role (Serences, 2016; Albers, Kok, Toni, Dijkerman, & De Lange, 2013). This has led to the argument that conflicting findings regarding the importance of either frontal or visual regions in VWM are likely dependent on individual differences in the recruitment of visual strategies in VWM (Pearson & Keogh, 2019; Linke, Vicente-Grabovetsky, Mitchell, & Cusack, 2011). For example, some studies show visual representations in VWM are decoded in early visual areas (V1–V3; Albers et al., 2013), and others demonstrate the importance of top–down connectivity between high-level regions, such as the lateral pFC and the visual cortex (Sreenivasan et al., 2014). It is therefore speculated that not all individuals approach visual memory tasks in the same manner; however, research is yet to test how individuals use different visual strategies—namely, MI—within a VWM task.

There is, however, evidence for shared visual representations between MI and VWM. Findings have shown that oriented gratings held in mind in VWM can be decoded using multivoxel pattern analysis in visual areas V1–V4 (Harrison & Tong, 2009). This has then been extended to show that a classifier trained on early visual area (V1–V3) activation in VWM trials reliably decoded activation in MI trials and vice versa (Albers et al., 2013). On the basis of this evidence, we might conclude that MI and VWM are therefore not distinguishable (Tong, 2013). However, behavioral evidence does not entirely align with this suggestion. Behavioral studies adopting a sensory strength measure of MI, which measures the extent to which perception is altered following an imagery period in a binocular rivalry paradigm (Pearson, Clifford, & Tong, 2008). Results from this task have implied that the recruitment of visual strategies in VWM is dependent on MI strength. When visual noise is presented during the delay period of a VWM task, it negatively impacts performance, which is taken to suggest it disrupts the visual information from being held in mind (Baddeley & Andrade, 2000). This interpretation is supported by the finding that MI is also disrupted when background luminance is modulated (Pearson et al., 2008). In turn, it has been shown that VWM performance was significantly poorer in the modulated background luminance condition but only in those that scored highly on the MI sensory strength measure. It was therefore interpreted that only “good imagers” recruit visual strategies in VWM (Keogh & Pearson, 2011, 2014). In addition, our group has recently found no significant associations between visual and transformation components of MI and maintenance and manipulation measures of VWM (Bates & Farran, 2021), further adding to ambiguity around the types of strategies individuals recruit in VWM tasks.

This is not the only study to imply individual differences in the recruitment of MI strategies for VWM. A 2020 study that examined the effects of training a visualization strategy for a set of VWM tasks in adults found that in the control group (no strategies trained), only 4% reported visualization (e.g., “I visualised the numbers”) and no participants in the control group reported a self-generated imagery strategy (e.g., “I tried to associate each digit with some image in my mind”; Forsberg, Fellman, Laine, Johnson, & Logie, 2020). Instead, self-generated strategies included rehearsal (“I repeated the list of letters in my mind”), grouping (“I remembered the digits in groups”), and other (“I made up a song….”). Examining the extent to which individual differences in MI impact VWM would further elucidate the role of visual strategies/MI in supporting memory. Research thus far has been restricted to comparisons between absolute performance on MI measures and on VWM measures. To fully elucidate how MI supports VWM, it is necessary to investigate how within-task individual differences in the precision of visual representations and the sensory experience of MI impacts VWM performance.

Measuring MI within a VWM Task

The visual precision and capacity of VWM maintenance have been documented using the study of ERPs, namely, contralateral delay activity (CDA). In their seminal paper, Vogel and Machizawa (2004) found that CDA is modulated as a function of the number of items held in mind up to four items. The finding that CDA can index VWM capacity has since been replicated (see Luria, Balaban, Awh, & Vogel, 2016, for a review), and there is evidence for individual differences in that greater CDA amplitude is denoted in individuals with good VWM compared with those with poorer VWM (Adam, Robison, & Vogel, 2018).

The visual precision of VWM representations held in mind can also be indexed by CDA amplitudes. Researchers applied an orientation-discrimination paradigm to not only discriminate between CDA amplitudes associated with increasing set size but also those associated with coarse (45°) and fine (15°) orientation discriminations. Here, it was found that at smaller set sizes, there was greater CDA amplitude in fine orientation discriminations compared with coarse. Thus, it was interpreted that at lower capacities, individuals exert willful control over the visual precision of their representations and that the CDA amplitude can reflect both the precision and capacity of maintained representations (Machizawa, Goh, & Driver, 2012). This evidence has been extended to show that CDA is modulated by instruction. When participants were instructed to focus on precision, CDA was associated with gray matter volume in the left lateral occipital area, whereas when instructed to focus on capacity, CDA was associated with gray matter volume in the right intraparietal sulcus (Machizawa, Driver, & Watanabe, 2020). These findings support a threshold model of VWM and demonstrate the importance of accounting for both the visual precision and number of items.

Notably, there is overlap between how visual representations are described in the parallel VWM and MI literatures. Specifically, what might be described in the VWM literature as visual precision of representations would ultimately be described as the visual vividness or quality of mental images in MI literature. We might therefore assume that at smaller set sizes, neural correlates of precision, that is, CDA, reflects the visual quality of visual images held in mind during VWM, and otherwise CDA reflects the capacity of visual items held in mind. However, this has not been measured alongside the reported subjective, sensory experience of MI. For simplicity, we will continue with the term visual precision when referring to instruction to attend to the precision of the representation, vividness when referring to the subjective vividness rating of representations, and capacity when referring to instructions to attend to capacity of the representations and number of items when referring to the subjective rating of number of items held in mind.

The most common approach to MI research is to measure the sensory experience of MI using subjective ratings. This is not surprising given that MI is an inherently private and variable sensory experience. In the quest to establish evidence to suggest that visually depictive representations are recruited during MI, research has examined the relationship between the subjective, sensory experience of MI and selective neural activation of visual areas. For example, studies have adopted trial-by-trial vividness ratings. A significant positive association between the behavioral MI sensory strength score and trial-by-trial subjective vividness ratings (1 = almost no imagery, 2 = some weak imagery, 3 = moderate imagery, 4 = strong imagery almost like perception) has been evidenced (Pearson, Rademaker, & Tong, 2011). This was interpreted to suggest that individuals have good insight into their MI. More recently, it has been shown that the overlap between brain regions activated during MI and during visual perception is positively associated with trial-by-trial subjective vividness (1 = not vivid at all to 4 = very vivid; Dijkstra, Bosch, & van Gerven, 2017). With respect to confidence, Williams, Robinson, Schurgin, Wixted, and Brady (2022) manipulated instruction to demonstrate confidence in responses reflects memory strength in a VWM task. However, whether individuals have good insight into the precision and capacity of their representations during a VWM task is yet to be tested.

Taken together, based on the behavioral and neural findings, we might assume that the subjective sensory experience of the vividness of MI maps onto the precision of visual representations. However, this has not been directly assessed with respect to VWM because these processes have been examined in parallel literatures. Although evidence in the VWM literature suggests that the number of items and the precision of items held in mind during the delay period in VWM can be quantified by CDA, the extent to which this reflects the subjective sensory experience of the number of items and precision of items in MI is yet to be addressed. Therefore, adapting a VWM paradigm to include trial-by-trial subjective vividness ratings and capacity ratings (number of items in mind) presents a novel opportunity to address the current gap in the literature in understanding how individual differences in MI impact VWM.

The Current Study

The current study was designed to directly examine how MI is recruited in a VWM task in the form of two aims. The first aim is to characterize how behavioral and neural correlates VWM are modulated by expectations of instruction (precision/capacity) and type of subjective ratings (vividness/number). For clarity, precision is adopted from the VWM literature (such as Machizawa et al., 2012; Zhang & Luck, 2008) and forms the dependent variable of proportion correct in an orientation-discrimination task where stimuli are presented at varying levels of precision (fine precision/15° orientation change and coarse precision/45° orientation change). The term vividness is adopted from the MI literature (e.g., Pearson et al., 2011; Marks, 1995) and refers to subjective ratings of how vivid participants deem the representation they held in mind during each orientation-discrimination trial. The second aim is to establish the metacognitive link between the subjective sensory experience of MI and behavioral and neural correlates of VWM (CDA). Our hypotheses are outlined at the end of the Methods section.

Participants

Participants were recruited from the participant recruitment system database at Birkbeck, University of London and the surrounding community of London, United Kingdom. All participants gave written informed consent and had the option of receiving £25 to participate or the equivalent course credit. Ethics approval was provided by the university ethics committee. Participants had normal or corrected-to-normal vision, and each participant completed the Ishihara 38 Plates CVD Test (https://www.color-blindness.com/ishihara-38-plates-cvd-test/) to check for red–green color deficiencies and were required to score “none” to participate. Twenty-three individuals were recruited for the final experiment. Before artifact rejection, two participants were excluded due to incomplete data sets caused by technical issues; three more participants were excluded as they did not respond in any of the trial-by-trial subjective ratings and thus did not produce any behavioral ratings data. One more participant was excluded following artifact rejection because of there being less than 75% of the total trials remaining. Seventeen participants are included in the reported results (age: M = 26.00 years, SD = 4.39 years, 10 women). Power is outlined in the next section alongside trial numbers.

Materials and Procedure

A classic orientation discrimination VWM paradigm developed by Machizawa and colleagues (2012, 2020) was adapted to include within-trial subjective ratings of MI (see Figure 1 for schematic of trial sequence and outline of blocks). Participants were instructed to memorize an array of bars, hold the orientation of bars in mind, rate either the vividness or capacity of their MI, and subsequently determine whether the highlighted bar in the probe array had been rotated clockwise or counterclockwise. A fixation point was presented in the center of the screen throughout the trial, and participants were required to maintain their gaze at the fixation point. First, participants were cued to memorize either the bars presented to the left or right side of the screen. Second, the sample display was presented, which consisted of two, four, or eight bars (one, two, or four bars to be remembered and presented to each hemifield, respectively). Participants were instructed to maintain fixation at the central fixation point and hold the bars in mind as accurately as possible during the subsequent delay. Following the delay, a tone rating cue was presented to cue participants to rate either the vividness of the representation held in mind or number of items they had in mind, depending on the block. The tone was generated in Cogent 2000 and comprised a 250-Hz sine wave lasting 200 msec, which was played from a speaker placed behind the participant's chair. Finally, a probe array was presented, which was the same as the sample array except that the highlighted bar/item had been rotated. Fine (15° orientation change) and coarse (45° orientation change) trials were randomized within each block, as were clockwise and counterclockwise orientations. Participants were required to respond as to whether the highlighted bar had been rotated right (clockwise) or left (counterclockwise).

Figure 1.

(A) Trial sequence. Intertrial interval ranged between 500 and 700 msec. For each trial, an arrow cue was presented for 200 msec to indicate which side of the screen should be attended to. This was followed by a 300- to 500-msec interval before the sample array was presented for 200 msec. The sample array consisted of one, two, or four bars on each side of the screen (set Size 2 pictured) and either red (precision-focused instruction block) or green (capacity-focused instruction block, pictured) bars. This was followed by a 1300-msec delay period whereby participants had to hold the image in mind. After the delay, a tone rating cue was played and participants provided either a vividness or capacity rating, depending on the block. Subsequently, a probe array was presented until the participant responded (or 2500 msec) whereby all stimuli except the target stimulus were presented in black. Participants were required to judge whether the target was rotated clockwise (pictured) or counterclockwise compared with the memorized sample array. (B) Schematic representation of experiment procedure. Order of blocks was counterbalanced per participant.

Figure 1.

(A) Trial sequence. Intertrial interval ranged between 500 and 700 msec. For each trial, an arrow cue was presented for 200 msec to indicate which side of the screen should be attended to. This was followed by a 300- to 500-msec interval before the sample array was presented for 200 msec. The sample array consisted of one, two, or four bars on each side of the screen (set Size 2 pictured) and either red (precision-focused instruction block) or green (capacity-focused instruction block, pictured) bars. This was followed by a 1300-msec delay period whereby participants had to hold the image in mind. After the delay, a tone rating cue was played and participants provided either a vividness or capacity rating, depending on the block. Subsequently, a probe array was presented until the participant responded (or 2500 msec) whereby all stimuli except the target stimulus were presented in black. Participants were required to judge whether the target was rotated clockwise (pictured) or counterclockwise compared with the memorized sample array. (B) Schematic representation of experiment procedure. Order of blocks was counterbalanced per participant.

Close modal

Four blocks of 96 trials (384 total trials) were presented with two breaks within each block and an additional break between blocks to reduce fatigue and boredom (see Figure 1B). Blocks were differentiated by instruction and rating type. In the precision-focused instruction block, participants were asked to focus on to holding a visually precise image in mind, and in the capacity-focused instruction, participants were required to focus on holding as many items in mind as required (i.e., they should try and hold all four items in mind in the four-item condition). There were two rating types: vividness and number. In vividness rating blocks, participants were required to rate the vividness of the representation held in mind on a scale of 1–4 in line with previous paradigms: 1 = almost no image, 2 = weak image, 3 = moderate image, 4 = strong image/almost like perception (as in Pearson et al., 2011). In capacity rating blocks, participants were required to rate the number of items they held in mind (see Figure 1B for the procedure). The order of blocks was counterbalanced per participant. Set size (one item, two items, four items), precision (fine, coarse), and attended side (left, right) were randomized within each block resulting in eight trials per condition. A study conducted simulations to estimate how many participants and how many trials are required for different levels of power in CDA analyses (Ngiam, Adam, Quirk, Vogel, & Awh, 2021). It was suggested that 30–50 trials were required per condition to detect the presence of CDA, and up to 400 trials per condition with 25 participants could be needed to detect differences between set size conditions in CDA with 80% power. The task with 384 trials already takes just under an hour to complete; therefore, adding more trials would distort the quality of the data. Moreover, robust CDA effects have been established in previous studies with ∼20 participants and ∼80 trials per condition (Machizawa et al., 2012, 2020).

To familiarize participants with the task, they completed a precision-focused block and capacity-focused block (with either vividness or capacity ratings, counterbalanced) with 24 trials per block as practice. The practice blocks were repeated if participants scored < 65% percentage correct. A confidence rating was included at the end of each experimental block where participants were asked to rate their confidence in their behavioral performance of that block. Although the subjective rating is purposefully placed before the probe array in the trial sequence to reduce the confound of confidence, a weak correlation between subjective ratings and confidence was expected. To test this, participants were presented with a blank gray screen at the end of the block with “confidence?” in the center and they were required to answer according to a standard 5-point Likert scale: 1 = not confident at all, 2 = slightly confident, 3 = somewhat confident, 4 = fairly confident, 5 = completely confident (four confidence trials in total).

EEG Recording

EEG data were continuously recorded offline at 1000-Hz sampling rate using a fitted cap (EASYCAP) with 64 Ag-AgCl passive electrodes according to the International 10–20 system using a BrainVision BrainAmp amplifier. No online filters were applied during the recording. The cap included two horizontal EOG channels mounted in the cap at FT9 and FT10 locations. A vertical EOG channel was placed directly underneath the right eye to monitor blinks and saccades. Electrical impedance was kept below 5 kΩ. During the recording, FCz acted as the reference electrode and AFz as the ground electrode.

Preprocessing of EEG Data and CDA Extraction

After the recording, the continuous data were preprocessed offline in MATLAB (The MathWorks, Inc.) using the MATLAB toolbox EEGLAB (Version 2019.1.; Delorme & Makeig, 2004). Data were filtered offline with an 8th-order Butterworth bandpass filter at 0.05–30 Hz and resampled at 500 Hz. Data were then epoched to −200 to 1400 msec around the sample array onset and baseline corrected (−200 to 0 msec). Blinks during the sample array onset (0–200 msec) were first detected using a moving window peak-to-peak detection algorithm with a window size of 200 msec, a step of 10 msec, and a threshold of 50 μV; trials with blinks during the sample array onset were then rejected (M = 23, SD = 18, range = 4–61).

Next, an algorithm to detect square waves in the bipolar HEOG channel was applied with the threshold criteria set to ± 18 μV. A bipolar HEOG channel was derived (right horizontal EOG channel subtracted from left horizontal EOG channel) to observe the magnitude of left and right saccades, respectively. Mean amplitudes between 300 and 500 msec following cue onset were calculated for each visual angle (2°, M = 10.88, SD = 2.43, 4°: M = 22.82, SD = 6.07; 6°: M = 36.91, SD = 10.04; 8°: M = 48.65, SD = 12.21; 10°: M = 59.94, SD = 14.09). A repeated-measures ANOVA of amplitude with a within-subject factor of Visual Angle (2°, 4°, 6°, 8°, 10°), which revealed increasing amplitude with visual angle, F(4, 16) = 73.82, p < .001, ηp2 = .94). Post hoc comparisons showed no overlap across visual angles (ps < .001). On the basis of the mean amplitudes, a simple formula can be applied to estimate the degree of horizontal eye movement: y = x / 6; where x = bipolar HEOG channel amplitude and y = degrees the eyes moved. The stimuli in the main experiment were presented between 2.5° and 6.5° visual angle, and the formula indicates that 2° saccades would be characterized as a mean bipolar HEOG channel amplitude of ± 12 μV and 3° would be characterized as mean bipolar HEOG channel amplitude of ± 18 μV. To avoid overcorrection of data, 18 μV was chosen as the final value to detect saccades in the main experiment trials.

Although this was effective in detecting saccades, the algorithm also detected ± 18-μV square waves that were too quick to be saccades (i.e., 50 msec; mean number of trials detected = 86, SD = 64, range = 7–200). Therefore, the trials flagged by the algorithm were checked by eye to determine whether the square waves detected were in fact saccades, that is, the square wave spanned ∼200 msec (mean number of trials detected = 32, SD = 34, range = 7–118). As can be seen from the range, if all trials with saccades were removed, this would result in more participants being excluded because of insufficient data. Research has shown that applying independent component analysis (ICA) to remove saccade and blink components does not distort data for CDA analyses and is therefore an efficient method to retain data (Drisdelle, Aubin, & Jolicoeur, 2017). ICA was therefore conducted using the Second Order Blind Identification algorithm in EEGLAB, and components were observed using ICLabel (Pion-Tonachini, Kreutz-Delgado, & Makeig, 2019). Saccade and blink components were detected with the aid of ICLabel, which labels components according to the pattern of activity (e.g., eye component, muscle components). An average of two (SD = 1, range = 1–5) components that were deemed either blink or saccade components were removed.

The blink and saccade algorithms were re-applied to the ICA corrected data, and any remaining trials with saccades exceeding the 18-μV threshold and blinks exceeding the 50-μV threshold were rejected (4 ± 2, range = 0–8). Finally, extreme values of ± 75 μV and abnormal trends of linear drift over the entire epoch time-window (50 μV, r = .80) were detected and rejected. The average number of remaining trials including all conditions for the CDA analyses following all artifact rejection was 335 trials (SD = 30, range = 278–64). The number of trials remaining following artifact rejection was similar across all conditions (mean = 6.98, SD = .16, range = 6.65–7.25). The cleaned data were then computationally rereferenced to bilateral mastoid electrodes (T9 and T10), in line with previous literature conducting CDA analyses (Machizawa et al., 2012). Channels rejected because of noise by the EEGLAB automated criteria were interpolated (1 ± 1, range = 0–2). Finally, as in convention, the average CDA waveform was obtained from posterior parietal and temporal-occipital channels (namely, P5/6, P7/8, PO3/4, PO7/8, and O1/2), and CDA amplitude was computed from 400 to 1400 msec after sample onset for each condition.

Data Analysis

Tests of normality revealed some variables were not normally distributed; however, parametric analyses were applied given that ANOVA is robust to violations of assumptions of normality (Blanca, Alarcón, Arnau, Bono, & Bendayan, 2017). All within-subject post hoc comparisons are reported with Bonferroni corrections. Where assumptions of sphericity were violated, Greenhouse–Geisser estimates are reported. RTs for subjective ratings and behavioral responses to the probe array that were less than 250 msec or equal to 2500 msec (no response) were excluded in analyses as inappropriate responses.

Hypotheses and Aims

With reference to the first aim, behavioral outcomes are firstly expected to replicate previous findings (Machizawa et al., 2012; Vogel & Machizawa, 2004). Specifically, accuracy (proportion correct) was predicted to be greater in coarse versus fine orientation-discrimination trials and greater in lower (one and two items) versus higher (four items) set sizes. We also predicted an interaction between instruction (precision-focused vs. capacity-focused) and precision (fine vs. coarse) in that there would be greater accuracy in coarse versus fine precision in precision-focused (try to maintain a highly precise representation) trials only but not in capacity-focused trials. With regard to the focus on attention, instruction was also expected to modulate subjective ratings in that greater vividness ratings were expected in precision-focused blocks compared with capacity-focused (try to maintain as many items as required) blocks and greater capacity ratings are expected in capacity-focused blocks compared with precision-focused blocks.

Measuring EEG during the behavioral VWM task allows for the unique opportunity to directly measure the visual precision and capacity of items held in mind during the delay period (via CDA). We further expected instruction to modulate the usage of memory resource indexed by the CDA (Machizawa et al., 2020). As CDA is measured during the delay period and before the behavioral response, if CDA is modulated by instruction, this would demonstrate that individuals could flexibly control the precision and capacity of their visual representations at will (as implied in previous evidence: Machizawa et al., 2020; Zhang & Luck, 2008). If this is the case, differences in CDA were expected between precision-focused trials compared with capacity-focused trials at low set size but not between fine and coarse precision trials, this is because participants expect and can prepare for either precision- or capacity-focused responses, but actual difficulty (fine and coarse trials) was not cued in this experiment; therefore, they cannot prepare for this. In summary, this will extend previous findings by examining how instruction modulates VWM consumption as indexed by CDA amplitude and how it interacts with the number of items. Finally, it was assumed that the established CDA set size effect would be replicated here in that CDA would increase as a function of set size up to four items (e.g., Machizawa et al., 2012; Vogel, McCollough, & Machizawa, 2005; Vogel & Machizawa, 2004).

Next, we tested the relationship between confidence ratings and vividness and number ratings. We expected a significant positive association between confidence and vividness ratings, and a significant negative association between confidence and nondivergent ratings (when participants reported holding the correct number of items in mind). As previous evidence has demonstrated a positive association between confidence and strength of memory (Rademaker & Pearson, 2012), we expect to find such a relationship with the vividness and number of items reported in this VWM task.

Finally, we examined how the sensory experience of MI was associated with behavioral and neural correlates of VWM. Evidence for significantly greater accuracy in trials rated as high vividness compared with low vividness was predicted and significantly greater accuracy in nondivergent number ratings (e.g., rated two items in mind when required to remember two items) compared with divergent number ratings (e.g., rated two items in mind when required to remember four items) was also predicted. With regard to neural correlates, it was predicted that CDA amplitudes would be significantly larger in high vividness trials compared with low vividness trials, and this effect would likely be greater in precision-focused trials at smaller set sizes. It was also predicted that CDA amplitudes would be significantly larger in larger set sizes in trials with nondivergent number ratings compared with trials with divergent number ratings. Together, this would support the assumption that individuals have good insight into their visual representations in both MI and VWM. Moreover, it will demonstrate that CDA maps not only the visual precision and/or capacity of representations but also the subjective sensory experience of MI within VWM. This would therefore provide a novel method for measuring the role of MI in VWM.

Characterizing the Visual Precision and Capacity of VWM Maintenance as Indexed by Proportion-correct, Subjective MI Ratings and CDA

Accuracy (Proportion Correct)

Overall accuracy (as measured by proportion correct) was .71 (SD = .09), which is comparable to previous reports with a similar version of this task (Machizawa et al., 2012). Descriptive statistics of proportion correct for all conditions are reported in Figure 2.

Figure 2.

Mean (and ± SE) accuracy (proportion correct) per condition, for example, “right, fine, precision, vividness” refers to trials for the right attended, fine precision (15 orientation), precision-focused instruction, vividness rating.

Figure 2.

Mean (and ± SE) accuracy (proportion correct) per condition, for example, “right, fine, precision, vividness” refers to trials for the right attended, fine precision (15 orientation), precision-focused instruction, vividness rating.

Close modal

A repeated-measures four-way (3 × 2 × 2 × 2) ANOVA was conducted with proportion correct as the dependent variable and within-subject factors of Set Size (one item, two items, four items), Precision (fine, coarse), Instruction (capacity-focused, precision-focused), Rating (vividness, capacity), and Attended Side (left, right). First, as was expected, accuracy significantly varied with set size, F(2, 32) = 82.59; p < .001; ηp2 = .84; Bonferroni-corrected post hoc comparisons revealed a significant decrease in proportion correct between all comparisons (all ps < .001). Also in line with previous findings, accuracy significantly varied with required precision, F(1, 16) = 10.31; p = .005; ηp2 = .39, such that there was greater proportion correct in coarse precision (45° orientation-change) trials compared with fine precision (15° orientation-change) trials. There was no main effect of Instruction, F(1, 16) = 2.58; p = .128; ηp2 = .39, BF10 = .82; Rating (F < 1, BF10 = .33); or Attended Side, F(1, 16) = 2.58, p = .128, ηp2 = .39, BF10 = .81, nor an interaction between Precision and Instruction, F(1, 16) = 3.31, p = .088, ηp2 = .17.

There was a significant three-way interaction between Attended Side, Set Size, and Instruction, F(2, 32) = 4.31, p = .022, ηp2 = .21. Follow-up ANOVAs for each set size were conducted to explore this interaction. There was a significant interaction between Attended Side and Instruction only in the four-item condition, F(1, 16) = 7.22; p = .016; ηp2 = .31; 1-item Condition Attended Side × Instruction interaction: F < 1; 2-item Condition Attended Side × Instruction interaction: F(1, 16) = 1.88; p = .189; ηp2 = .11. Follow-up t tests revealed an effect of Attended Side; significantly greater proportion correct in the right attended trials compared with the left attended trials for the capacity-focused condition, t(16) = 3.01; p = .030, d = .36, but not in the precision-focused condition, t(16) = .58; p = 1.00; d = .07. The three-way interaction between Attended Side, Precision, and Rating was not significant (ns), F(1, 16) = 3.97; p = .064; ηp2 = .19, and the four-way interaction between Attended Side, Rating, Instruction, and Rating was also ns, F(3, 32) = 2.98; p = .065; ηp2 = .16.

Trial-by-trial Subjective Ratings on Vividness and Number

Next, separate ANOVAs were conducted on vividness ratings and capacity ratings, respectively. The within-subject factors were Set Size (one item, two items, four items), Precision (fine, coarse), Instruction (capacity-focused, precision-focused), and Attended Side (left, right). Descriptive statistics of vividness ratings are presented in Figure 3.

Figure 3.

Mean and SE of vividness (top) and number (bottom) ratings per condition. The y-axis labels indicate condition, for example, right, fine, precision indicates right-attended, fine precision (15° orientation), and precision-focused instruction condition.

Figure 3.

Mean and SE of vividness (top) and number (bottom) ratings per condition. The y-axis labels indicate condition, for example, right, fine, precision indicates right-attended, fine precision (15° orientation), and precision-focused instruction condition.

Close modal

The vividness ratings ANOVA revealed a significant main effect of Set Size, F(2, 32) = 3.58; p = .040; ηp2 = .18, where post hoc comparisons showed marginally significantly higher vividness ratings when participants were required to remember one item compared with when they remembered four items (p = .055; all other ps > .05). There was no main effect of Precision (F < 1, BF10 = .33), Attended Side (F < 1, BF10 = .32), or Instruction, F(1, 16) = 3.71; p = .072; ηp2 = .18, BF10 = .97). There were no significant interactions (all F < 1.06; ns).

An equivalent ANOVA was conducted on capacity ratings with the same within-subject factors as the vividness ratings ANOVA. Contrary to the vividness rating, number rating monotonically varied as a function of set size, F(2, 32) = 32.04, p < .001, ηp2 = .67, where capacity ratings increased with each increase in number of items (2 items >1 item: p < .001, four items > 2 items: p = .004, four items < 1 item: p < .001). There were no main effects of Precision (F < 1, BF10 = .45), Instruction (F < 1, BF10 = .46), or Attended Side (F < 1, BF10 = .59), and there were no significant interactions (all Fs < 1, ns).

CDA

To examine how CDA was modulated by condition, an ANOVA was conducted on grand-averaged CDA and within-subject factors of Set Size (one item, two items, four items), Precision (fine, coarse), Instruction (capacity-focused, precision-focused), Rating (vividness, capacity), and Attended Side (left, right). Mean and standard error of grand-averaged CDA for all conditions are presented in Figure 4.

Figure 4.

Mean and SE of grand-averaged CDA per condition. The y-axis labels indicate condition, for example, right, fine, precision indicates right-attended, fine precision (15° orientation), and precision-focused instruction condition.

Figure 4.

Mean and SE of grand-averaged CDA per condition. The y-axis labels indicate condition, for example, right, fine, precision indicates right-attended, fine precision (15° orientation), and precision-focused instruction condition.

Close modal

In line with previous reports, CDA significantly increased as set size, F(2, 32) = 14.06; p < .001; ηp2 = .47. Post hoc comparisons revealed significantly greater CDA between one item (M = −.99, SD =.60) and two items (M = −1.31, SD = .59; p = .020) as well as one item and four items (M = −1.58, SD =.85; p < .001), and the difference between two items and four items was ns (p = .07). Given our sample size, we computed a power calculation to confirm this effect. We found the effect is powered to .91 with just eight participants (calculation: f2 = .94, p = .001, power = .80, number of groups = 1, number of measurements = 3). There was no main effect of Precision (F < 1, BF10 = .33), Instruction (F < 1, BF10 = .33), rating (F < 1, BF10 = .35), or Attended Side (F < 1, BF10 = .37). There was a significant three-way interaction between Precision, Instruction, and Attended Side, F(1, 16) = 6.01; p =.026; ηp2 = .27, and a significant four-way interaction between Instruction, Attended Side, Set Size, and Rating, F(2, 16) = 4.06; p = .027; ηp2 = .20.

Follow-up ANOVAs on precision-focused and capacity-focused blocks, respectively, were conducted to explore the significant three-way interaction. There was a significant interaction between Attended Side and Precision in the capacity-focused condition only, F(1, 16) = 5.85; p = .028; ηp2 = .27 (Precision-focused Condition Attended Side × Precision interaction: F < 1). t Tests of the effect of Precision for each attended side revealed significantly greater (more negative) CDA in coarse trials compared with fine trials in the right attended condition, t(16) = 2.74; p = .015; d = .66, but there was no difference between fine and coarse in the left attended condition, t(16) = 1.23; p = .238; d = .29.

With regard to the four-way interaction, there was a significant interaction between Attended Side, Set Size, and Rating in the capacity-focused trials only, F(2, 32) = 3.35; p = .048; ηp2 = .17; Precision-focused Condition Attended Side × Set Size × Rating interaction: F(1, 32) = 1.41; p = .259; ηp2 = .08. Follow-up ANOVAs for each set size for capacity-focused trials revealed a significant interaction between Rating and Attended Side in the 2-item condition only, F(1, 16) = 4.50; p = .050; ηp2 = .08; one-item Condition Rating × Attended Side interaction: F(1, 16) = 2.39; p =.141; ηp2 = .14; 4-item Condition Rating × Attended Side interaction: F(1, 16) = 2.93; p =.107; ηp2 = .16. Although the means point toward greater CDA amplitude in left (M = −1.65, SD = 1.93) compared in right (M = −1.04, SD = .04) attend trials in the capacity ratings, this was ns, t(16) = .97; p = .345; d = .24. There was also no significant difference between left attend (M = −1.16, SD = 1.35) and right attend trials (M = −1.28, SD = 1.59) in the vividness ratings condition, t(16) = .183; p = .857; d = .04. There were no other significant interactions, Set Size × Rating: F(2, 32) = 2.65, p = .086, ηp2 = .14, all other Fs < 1. Grand-averaged ipsilateral, contralateral, and CDA waveforms for each set size are presented in Figure 5A, and waveforms per block presented in Figure 5B.

Figure 5.

(A) Grand-averaged waveforms for the one-item trials (left), two items (center) and four items (right). Sample onset is at 0- to 200-msec and vertical dotted line at 400 msec added for reference (CDA amplitude calculated as mean amplitude between 400 msec and 1400 msec after sample onset). (B) CDA waveform per condition; CDA was calculated from 350 msec to 1400 msec.

Figure 5.

(A) Grand-averaged waveforms for the one-item trials (left), two items (center) and four items (right). Sample onset is at 0- to 200-msec and vertical dotted line at 400 msec added for reference (CDA amplitude calculated as mean amplitude between 400 msec and 1400 msec after sample onset). (B) CDA waveform per condition; CDA was calculated from 350 msec to 1400 msec.

Close modal

The Relationship between Subjective Trial-by-trial MI Ratings and VWM Maintenance

The first set of analyses were conducted to investigate how behavioral and neural correlates of VWM were modulated by expectations of instruction (precision/capacity) and subjective ratings of items held in mind during the maintenance period (vividness/number). The next set of analyses were conducted to address Aim 2: to examine the metacognitive link between subjective ratings of MI and behavioral and neural indices of VWM maintenance.

Behavioral Contrast between Low versus High Vividness Trials and Nondivergent versus Divergent Capacity Trials

To investigate whether individual's subjective ratings reflected VWM accuracy, two paired-samples t tests were conducted to examine the difference in proportion correct between trials rated with high vividness and low vividness and nondivergent and divergent capacity ratings, respectively. High vividness ratings were trials where the participant rated either 3 (moderate image) or 4 (strong image/almost like perception) and low vividness ratings were trials where the participant rated either 1 (almost no image) or 2 (weak image). Nondivergent capacity ratings were trials where participants rating did not diverge from the number of items they were required to hold in mind (e.g., required to hold four items in mind, reported holding four items in mind, score for trial = 0) and divergent capacity ratings were trials where participant diverged from number of items they were required to hold in mind (e.g., required to hold four items in mind, reported holding two items in mind, score for trial = 2). First, there was no significant difference between proportion correct in high vividness trials (M =.73, SD =.15) and low vividness trials (M =.67, SD =.12), t(16) = 1.51; p = .152; d = .37. For the capacity ratings analysis, one participant was excluded because none of their trials were divergent, and another participant was excluded as none of their trials were nondivergent. There was a significant difference between proportion correct in nondivergent ratings (M = .77, SD =.07) and divergent ratings (M =.68, SD =.19), t(14) = 2.21; p = .040; d =.57, which showed greater accuracy in nondivergent trials compared with divergent trials; see Figure 6.

Figure 6.

Mean accuracy (proportion correct) for nondivergent and divergent number rating trials (left) and high and low vividness rating trials (right).

Figure 6.

Mean accuracy (proportion correct) for nondivergent and divergent number rating trials (left) and high and low vividness rating trials (right).

Close modal

CDA in High versus Low Vividness Ratings and Nondivergent versus Divergent Capacity Ratings

To examine CDA between rating type at each set size and instruction, an ANOVA was planned with grand-averaged CDA as the dependent variable and Rating (high vividness, low vividness, nondivergent capacity, divergent capacity), Set Size (one item, two items, four items), Instruction (precision-focused, capacity-focused), and Attended Side (left, right) as the within-subject factors. However, as the conditions were based on participant responses, there was at least one condition per participant where there were no responses (e.g., some participants did not rate any four item trials as high vividness). Therefore, an ANOVA was conducted for vividness ratings and capacity ratings collapsed across all conditions except Vividness (number of high Vividness responses: M = 91, SD = 43, range = 34–151; number of low vividness responses: M = 68, SD = 46, range = 4–140) and Capacity (number of nondivergent responses: M = 115, SD = 35, range = 65–174; number of divergent responses: M = 50, SD = 35, range = 0–110), respectively. Despite the imbalance of trial numbers in accuracy variables, high and low vividness (W = .982, p = .971) and nondivergent and divergent (W = .910, p = .137) were normally distributed; and therefore, the assumptions for correlations are met. The vividness rating ANOVA included a within-subject factor of Vividness (high, low), which revealed no main effect of Vividness, F(1, 16) = 1.38; p = .258; ηp2 = .08, on grand-averaged CDA. The capacity ratings ANOVA included within-subject factors of Divergence (nondivergent, divergent). Similarly, to the vividness ANOVA, there was no main effect of Divergence (F < 1; Figure 7).

Figure 7.

Mean grand-averaged CDA in high and low vividness trials, and in divergent and nondivergent number rating trials.

Figure 7.

Mean grand-averaged CDA in high and low vividness trials, and in divergent and nondivergent number rating trials.

Close modal

Relationship between Proportion-Correct and Subjective MI Ratings As a Function of Set Size

Spearman's correlations were conducted to examine the relationship between proportion and rating at each set size. As the analyses above indicate only an effect of set size in ratings and proportion correct, precision, instruction, and attended side were collapsed across to retain power in the following analyses. Individual differences in vividness ratings were significantly and positively associated with proportion correct only in one item trials (rs = .578; p = .015); however, there were no significant correlations in two item trials (rs = .143; p = .585) or four item trials (rs = .010; p = .974).

For the capacity ratings analysis, the divergence score was included. Capacity divergence was not associated with proportion correct in one item trials (rs = −.427; p = .088), two item trials (rs = −.369, p = .144), or four item trials (rs = −.327, p = .200). Taken together, the findings suggest participants have relatively poor insight into the visual precision (vividness rating) of representations held in VWM and the number of visual items (capacity rating) in representations held in VWM, except for visual precision (vividness) at the smallest set size (one item).

Relationship between CDA and Subjective MI Ratings As a Function of Set Size

To assess the relationship between CDA and subjective MI ratings, separate correlations were conducted for vividness ratings and capacity ratings. For vividness ratings, the CDA-dependent variable was computed as the difference between grand-averaged CDA for one-item trials and two-item trials per participant, given that vividness is expected to be more prominent in smaller set sizes. The vividness ratings dependent variable consisted of the mean vividness ratings for two-item trials per participant. This is based on the logic that if vividness ratings map onto the number of items in mind as indexed by CDA, there should be a positive association between vividness ratings in two-item trials and the difference in CDA between one- and two-item trials, that is, the greater the set size effect in CDA, the higher the vividness rating. Thus, we were motivated to assess how ratings were related to CDA modulation effect of VWM. Moreover, a difference calculation of CDA rather than individual ERP allows us to control for nonneural influence on the signal, for example, participant's skull thickness or scalp condition. There was no relationship between the difference between CDA in one-item and two-item trials and vividness ratings in two-item trials (rs = −.314; p = .220).

For capacity ratings, the CDA-dependent variable was computed as the difference between grand-averaged CDA for one-item trials and four-item trials. The capacity ratings dependent variable consisted of the mean capacity rating for four-item trials. As above, this is based on the logic that if capacity ratings map onto the number of items held in mind as indexed by CDA, there should be a positive association between capacity ratings in four-item trials and the difference between CDA between one-item and four-item trials; that is, the greater the set size effect in CDA, the more items the participant reports holding in mind. However, there was no relationship between the difference between CDA in one-item and four-item trials and capacity ratings in four-item trials (rs = .302; p = .239).

Relationship between Subjective MI Ratings and Confidence Ratings

A Pearson's correlation was conducted between mean confidence ratings for vividness rating blocks (M = 3.53, SD =.62) and mean vividness ratings (M = 1.58, SD = .26). This revealed a strong positive correlation between confidence ratings and vividness ratings (r = .508; p =.037), which suggests the higher participants rated vividness, the greater the confidence participants had in their VWM accuracy. The equivalent Pearson's correlation was conducted between mean confidence ratings for capacity blocks (M = 3.47, SD = .91) and mean capacity divergence score (M =.58, SD = .40). A mean divergence score was calculated based on divergent and nondivergent responses. Nondivergent responses were scored 0 and were trials where the participant rated that they had all items in the array clearly in mind (e.g., they were required to remember four items and rated 4). Divergent responses were scores where the rating diverged from the number of items the participant was required to remember (e.g., required to remember four items, reported remembering two items, divergence score for trial = 2, whereas divergence score for a perfect report = 0). This showed a strong negative correlation between confidence ratings and divergence score (r = −.737; p < .001), suggesting the lower the divergence between the number of to-be-remembered items and the number of items in mind, the greater the confidence participants had in their VWM performance (Figure 8).

Figure 8.

Scatter plots for vividness rating as a function of confidence (left) and for divergent score and as a function of confidence rating (right).

Figure 8.

Scatter plots for vividness rating as a function of confidence (left) and for divergent score and as a function of confidence rating (right).

Close modal

The overarching goal of this study was to investigate how MI is recruited in VWM. The first aim was to characterize how instruction (precision-focused vs. capacity-focused) and the type of subjective rating (vividness vs. number) modulated the neural (CDA) and behavioral (accuracy) correlates of VWM. The second aim was to examine the relationship between the subjective sensory experience of MI (vividness and number) and the behavioral and neural correlates of VWM. We failed to find evidence that instruction, type of rating (vividness or number), or precision (fine vs. coarse orientation) modulated CDA or proportion correct. Previous findings regarding set size were replicated: poorer proportion correct with increasing set size and greater (more negative) CDA amplitude with increasing set size. We found no evidence for a relationship between MI and the visual precision and capacity of representations held in VWM. This may have implications for theory on the role of consciousness in VWM and for future methodology applied to understand individual differences in VWM. The findings are discussed in turn below.

The Interaction between Subjective Ratings of MI and the Behavioral and Neural Correlates of VWM Maintenance

Previous findings were replicated in that proportion correct was greater in smaller set sizes compared with larger set sizes (Machizawa et al., 2012, 2020). Contrary to expectations, proportion correct was not modulated by the cued conditions of instruction and type of subjective rating. Previous evidence suggests that individuals exert willful control over the precision of visual representations, as instructed, and this in turn influences their performance (Machizawa et al., 2012, 2020; Zhang & Luck, 2008). However, this effect was not found in the context of conditions instructing participants to consider the visual precision of their representations; that is, the precision-focused instruction instructs participants to hold a precise visual image in mind and the capacity-focused instruction instructs them to hold the correct number of visual items in mind (capacity). Thus, this calls into question the role of consciousness in MI compared with VWM. Vividness ratings were found to be higher at smaller set sizes, and capacity ratings increased with increasing set size, yet there were no effects of instruction on vividness or capacity ratings. This suggests that the type of instruction did not modulate individuals subjective experience of the number of items held in mind, which is perhaps not surprising given that the ratings are subjective in nature. New evidence has suggested that when encouraged to use imagery, those with high imagery vividness perform better on a VWM task compared with those with low imagery vividness (Slinn, Nikodemova, Rosinski, & Dijkstra, 2023). Future research with powered sample sizes and number of trials should test the nuances of instruction, as in our study, and how this might differentially modulate accuracy in VWM dependent on imagery vividness group.

The effects of attended side in proportion correct and CDA amplitude are notable. Proportion correct was greater in right attended trials compared with left attended in the capacity-focused condition and four-item trials only. Comparatively, greater CDA amplitudes were indexed in coarse compared with fine trials in the right attended but not left attended trials in the capacity-focused blocks. Although laterality differences were not initially hypothesized, the suggestion of hemispheric differences in proportion correct is consistent with recent findings in a similar paradigm. Namely, Machizawa and colleagues (2020) report that behavioral performance and CDA amplitudes in their precision-focused instruction condition (fine trials only) were associated with the gray matter volume in the right parietal cortex whereas behavioral performance and CDA amplitudes in their capacity-focused condition (coarse trials only) were associated with gray matter volume in the left lateral occipital cortex. The findings presented here, which are specific to the largest set size when participants were required to rate the number of items in mind (capacity rating) and were following a capacity-focused instruction, support the indication of left hemispheric specialization of VWM capacity.

The finding of greater amplitude in coarse trials compared with fine in the right attend trials only is perhaps not entirely surprising as it is partially in line with an association between coarse (capacity-focused) performance and left lateral occipital volume in Machizawa and colleague's (2020) study, although, in their study, coarse precision was cued. Therefore, the finding that there is a difference between coarse and fine trials is unexpected in that participants were not cued for the precision (fine, coarse) modulation in the current study. Given that the three- and four-way interactions include individual conditions with limited number of trials per condition, it is not possible to make general conclusions regarding hemispheric differences in CDA based on these findings and further research is warranted.

Distinction between Subjective MI Ratings and the Visual Contents of VWM

We failed to find evidence for a significant relationship between subjective MI ratings and vividness and capacity of VWM, except that vividness was significantly correlated with proportion correct in one-item trials, but not in two- or four-item trials. This result may be explained by willful control of our VWM resources at low set sizes (Machizawa et al., 2012; Zhang & Luck, 2008). Evaluation of whether willful control of our resources and awareness on perceived resolution of our MI is also constrained by VWM capacity should be examined in future studies. Proportion correct was higher in nondivergent compared with divergent capacity ratings, indicating that individuals have some insight into the number of items held in mind. There were no significant differences between CDA amplitude between high vividness rating trials and low vividness rating trials and no significant association between the CDA set size effect and vividness ratings. The important term here is “subjective.” We can make conclusions about individuals' subjective insight into their MI during this task, rather than their explicit ability in MI. Therefore, we would not rule out a functional relationship between VWM and MI or VWM and CDA based on this evidence.

In Pearson and Keogh's (2019) review, they argued that individual differences in the neural correlates of VWM may be dependent on the types of strategies recruited in VWM, that is, imagery strategies versus propositional strategies akin to general thought, and that measuring strategies recruited in VWM tasks might explain these individual differences. The study presented here directly addresses this proposition by measuring trial-by-trial subjective ratings of MI within a VWM task. However, we failed to find evidence for a relationship between self-reported subjective ratings/MI strategies and the precision and capacity at which visual information is held in mind. First, propositional/verbal strategies are unlikely in this task given the very short stimuli presentations (200 msec) and delay period (1400 msec). Moreover, the modulations in proportion correct and CDA amplitude depending on instruction demonstrate that individuals have flexible control over the precision and capacity at which visual information is held in mind, as discussed in detail above. In the few studies that have investigated the relationship between behavioral outcomes in VWM and MI, some have found an association (Keogh & Pearson, 2011, 2014), whereas others have not (Bates & Farran, 2021). For example, findings show that MI sensory strength was positively associated with VWM capacity at Set Size 3 (Keogh & Pearson, 2014) and only VWM performance in those with high MI sensory strength was disrupted by background luminance manipulations (Keogh & Pearson, 2011, 2014). Although in these studies it was argued that individuals with stronger MI recruit MI strategies in VWM, the findings presented in this study call into question whether assessing subjective strategies in VWM is akin to behavioral and neural indices of the visual precision and number of visual items maintained in VWM.

General Considerations and Limitations

It is important to consider potential methodological constraints. Previous evidence has suggested a relationship between saccades and MI in that participants tend to make similar gaze patterns when imagining a previously viewed stimulus as they do when viewing a stimulus, known as the “looking at nothing” effect (Johansson & Johansson, 2014; Brandt & Stark, 1997). Given the nature of EEG data, trials with saccades present artifacts that must be removed before analysis. Although ICA was conducted to retain as many trials as possible and the number of trials retained per participant was > 75% in this study, this is an important consideration given that high MI trials may have been rejected because of saccade artifacts. That being said, a 2021 study examining gaze patterns during MI found that gaze patterns during MI trials were not associated with vividness of MI as measured by the Vividness of Visual Imagery Questionnaire (Gurtner, Hartmann, & Mast, 2021). Therefore, it appears that it is unlikely that rejection of saccade trials would have influenced results examining the link between MI and VWM in this study. Future research examining gaze patterns alongside subjective ratings of MI within a VWM task would further elucidate this relationship.

It is also notable that participants appear to rarely rate at either end of the rating scales. For example, individuals rarely report having four items in mind in the number ratings. The fact that the vividness rating scale ranged from 1 to 4 and the capacity rating scale ranged from 0 to 4 could have been confusing for the participants. Thus, another study with only number report might be able to eliminate such confusion. However, the vividness rating scale was chosen as so in line with previous studies (Dijkstra et al., 2017; Pearson et al., 2011). This is the first study to adopt a capacity rating scale, and it appears individuals are reluctant to rate at either end of the scale. One previous study has used a continuous scale (i.e., visual analog scale) for rating vividness using a sliding bar (Dijkstra, Ambrogioni, Vidaurre, & van Gerven, 2020); however, responses broadly fell into the 1–4 category ratings and were therefore binned as such. Further research is required to test whether ratings are distorted by the Likert scale. The findings regarding the link between MI and VWM are somewhat limited because of infrequent responses. For example, some participants did not rate any four-item trials as low vividness; therefore, it was not possible to test the relationship between individual differences in ratings and CDA for each set size or instruction, for instance. Moreover, our vividness rating did not capture the strength or contrast of mental images, which are another important facet of imagery vividness (Riley & Davies, 2023). Future studies sampling participants based on low vividness, high vividness, nondivergent, and divergent ratings, as well as including more detailed assessments of vividness, would be useful to further examine individual differences.

In addition, it is important to recognize the limited sample size and its statistical power. Twenty-three participants were recruited, which is in line with previous studies demonstrating robust CDA effects in precision and capacity in VWM (Machizawa et al., 2012, 2020). However, because of exclusion, only 17 participants remained in the final sample. We subsequently conducted Bayes Factor analyses of the main effects with null findings, but some of the outcomes were inconclusive; therefore, further replication is needed. The instruction condition was added after piloting to reduce difficulty and to replace the expected condition or precision (fine, coarse). Although this allowed us to investigate questions regarding expectation, it rendered a five-factorial design with low power. Findings should therefore be interpreted with caution, and future replications should consider a simplified design. Small sample sizes are a common issue in neuroimaging studies (Button et al., 2013) given the resource and time constraints associated with this research. Recently, it was suggested that only 30–50 trials are sufficient to detect the presence of CDA but for differences between Set Sizes 2 and 4, up to 400 trials per condition could be required (Ngiam et al., 2021). Although this is informative, up to 400 trials per condition is practically very difficult as this would lead to lengthy experiments and therefore participant fatigue and boredom, which would rather induce a distortion of the data. It is important to strike a balance in methodological design and to take sample size and trial numbers into account when drawing conclusions on analyses of CDA. To note, we had relatively sufficient and feasible number of trials for simple main effect comparisons (i.e., approximately 90–120 trials per set size, collapsing across the other factors).

Conclusion

Ultimately, this study provides a much-needed account of the interaction between subjective ratings of MI and the behavioral and neural correlates of VWM. Contrary to hypotheses, participants appear to have poor insight into both of the visual precision and capacity of representations held in VWM. Rather than providing a novel method for measuring the role of MI in VWM using subjective ratings, we failed to find evidence for a relationship between the subjective sensory experience of MI and the visual precision and capacity of VWM. As our reports were mostly on averaged scores, future investigation on trial-by-trial approach may reveal momentary association or dissociation of our MI and VWM relations. This has methodological implications for examining how individual differences in MI support VWM and contributes to the theoretical interpretations of the role of consciousness in VWM.

Maro G. Machizawa has a leadership role and owns stock in Xiberlinc Inc. All other authors declare no conflict(s) of interest.

Corresponding author: Kathryn E. Bates, Department of Psychology, King's College London, United Kingdom, or via e-mail: [email protected].

The participants of this study did not give written consent for their data to be shared publicly; therefore, the data are not available.

Kathryn E. Bates: Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Visualization; Writing—Original draft. Marie L. Smith: Resources; Software; Writing—Review & editing. Emily K. Farran: Conceptualization; Funding Acquisition; Methodology; Supervision; Writing—Review & editing. Maro G. Machizawa: Conceptualization; Methodology; Supervision; Writing—Review & editing.

This research was supported by a 1 + 3 ESRC PhD Studentship (https://dx.doi.org/10.13039/501100000269), grant number: 1788622 awarded to K. E. B., and the JST COI grants (JPMJCE1311; JPMJCA2208) and JST Moonshot Goal 9 (https://dx.doi.org/10.13039/501100020963), grant number: JPMJMS2296 plus Hiroshima University Grant-in-Aid Basic Research to M. G. M.

Retrospective analysis of the citations in every article published in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .407, W(oman)/M = .32, M/W = .115, and W/W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/W = .109, and W/W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance. The authors of this paper report its proportions of citations by gender category to be: M/M = .583; W/M = .333; M/W = .083; W/W = 0.

Adam
,
K. C. S.
,
Robison
,
M. K.
, &
Vogel
,
E. K.
(
2018
).
Contralateral delay activity tracks fluctuations in working memory performance
.
Journal of Cognitive Neuroscience
,
30
,
1229
1240
. ,
[PubMed]
Albers
,
A. M.
,
Kok
,
P.
,
Toni
,
I.
,
Dijkerman
,
H. C.
, &
De Lange
,
F. P.
(
2013
).
Shared representations for working memory and mental imagery in early visual cortex
.
Current Biology
,
23
,
1427
1431
. ,
[PubMed]
Baddeley
,
A.
(
2003
).
Working memory: Looking back and looking forward
.
Nature Reviews Neuroscience
,
4
,
829
839
. ,
[PubMed]
Baddeley
,
A. D.
, &
Andrade
,
J.
(
2000
).
Working memory and the vividness of imagery
.
Journal of Experimental Psychology: General
,
129
,
126
145
. ,
[PubMed]
Bates
,
K. E.
, &
Farran
,
E. K.
(
2021
).
Mental imagery and visual working memory abilities appear to be unrelated in childhood: Evidence for individual differences in strategy use
.
Cognitive Development
,
60
,
101120
.
Blanca
,
M. J.
,
Alarcón
,
R.
,
Arnau
,
J.
,
Bono
,
R.
, &
Bendayan
,
R.
(
2017
).
Non-normal data: Is ANOVA still a valid option?
Psicothema
,
29
,
552
557
. ,
[PubMed]
Brandt
,
S. A.
, &
Stark
,
L. W.
(
1997
).
Spontaneous eye movements during visual imagery reflect the content of the visual scene
.
Journal of Cognitive Neuroscience
,
9
,
27
38
. ,
[PubMed]
Button
,
K. S.
,
Ioannidis
,
J. P. A.
,
Mokrysz
,
C.
,
Nosek
,
B. A.
,
Flint
,
J.
,
Robinson
,
E. S. J.
, et al
(
2013
).
Power failure: Why small sample size undermines the reliability of neuroscience
.
Nature Reviews Neuroscience
,
14
,
365
376
. ,
[PubMed]
Cowan
,
N.
(
2001
).
The magical number 4 in short-term memory: A reconsideration of mental storage capacity
.
Behavioral and Brain Sciences
,
24
,
87
114
. ,
[PubMed]
Delorme
,
A.
, &
Makeig
,
S.
(
2004
).
EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis
.
Journal of Neuroscience Methods
,
134
,
9
21
. ,
[PubMed]
Dijkstra
,
N.
,
Ambrogioni
,
L.
,
Vidaurre
,
D.
, &
van Gerven
,
M. A. J.
(
2020
).
Neural dynamics of perceptual inference and its reversal during imagery
.
eLife
,
9
,
e53588
. ,
[PubMed]
Dijkstra
,
N.
,
Bosch
,
S. E.
, &
van Gerven
,
M. A. J.
(
2017
).
Vividness of visual imagery depends on the neural overlap with perception in visual areas
.
Journal of Neuroscience
,
37
,
1367
1373
. ,
[PubMed]
Drisdelle
,
B. L.
,
Aubin
,
S.
, &
Jolicoeur
,
P.
(
2017
).
Dealing with ocular artifacts on lateralised ERPs in studies of visual-spatial attention and memory: ICA correction versus epoch rejection
.
Psychophysiology
,
54
,
83
99
. ,
[PubMed]
Forsberg
,
A.
,
Fellman
,
D.
,
Laine
,
M.
,
Johnson
,
W.
, &
Logie
,
R. H.
(
2020
).
Strategy mediation in working memory training in younger and older adults
.
Quarterly Journal of Experimental Psychology
,
73
,
1206
1226
. ,
[PubMed]
Gurtner
,
L. M.
,
Hartmann
,
M.
, &
Mast
,
F. W.
(
2021
).
Eye movements during visual imagery and perception show spatial correspondence but have unique temporal signatures
.
Cognition
,
210
,
104597
. ,
[PubMed]
Harrison
,
S. A.
, &
Tong
,
F.
(
2009
).
Decoding reveals the contents of visual working memory in early visual areas
.
Nature
,
458
,
632
635
. ,
[PubMed]
Jacobs
,
C.
,
Schwarzkopf
,
D. S.
, &
Silvanto
,
J.
(
2018
).
Visual working memory performance in aphantasia
.
Cortex
,
105
,
61
73
. ,
[PubMed]
Johansson
,
R.
, &
Johansson
,
M.
(
2014
).
Look here, eye movements play a functional role in memory retrieval
.
Psychological Science
,
25
,
236
242
. ,
[PubMed]
Keogh
,
R.
, &
Pearson
,
J.
(
2011
).
Mental imagery and visual working memory
.
PLoS One
,
6
,
e29221
. ,
[PubMed]
Keogh
,
R.
, &
Pearson
,
J.
(
2014
).
The sensory strength of voluntary visual imagery predicts visual working memory capacity
.
Journal of Vision
,
14
,
7
. ,
[PubMed]
Kosslyn
,
S. M.
(
1980
).
Image and mind
.
Harvard University Press
.
Linke
,
A. C.
,
Vicente-Grabovetsky
,
A.
,
Mitchell
,
D. J.
, &
Cusack
,
R.
(
2011
).
Encoding strategy accounts for individual differences in change detection measures of VSTM
.
Neuropsychologia
,
49
,
1476
1486
. ,
[PubMed]
Logie
,
R. H.
(
1995
).
Visuo-spatial working memory
.
Lawrence Erlbaum Associates
.
Lorenc
,
E. S.
,
Lee
,
T. G.
,
Chen
,
A. J.-W.
, &
D’Esposito
,
M.
(
2015
).
The effect of disruption of prefrontal cortical function with transcranial magnetic stimulation on visual working memory
.
Frontiers in Systems Neuroscience
,
9
,
169
. ,
[PubMed]
Luria
,
R.
,
Balaban
,
H.
,
Awh
,
E.
, &
Vogel
,
E. K.
(
2016
).
The contralateral delay activity as a neural measure of visual working memory
.
Neuroscience and Biobehavioral Reviews
,
62
,
100
108
. ,
[PubMed]
Machizawa
,
M. G.
,
Driver
,
J.
, &
Watanabe
,
T.
(
2020
).
Gray matter volume in different cortical structures dissociably relates to individual differences in capacity and precision of visual working memory
.
Cerebral Cortex
,
30
,
4759
4770
. ,
[PubMed]
Machizawa
,
M. G.
,
Goh
,
C. C. W.
, &
Driver
,
J.
(
2012
).
Human visual short-term memory precision can be varied at will when the number of retained items is low
.
Psychological Science
,
23
,
554
559
. ,
[PubMed]
Marks
,
D. F.
(
1995
).
New directions for mental imagery research
.
Journal of Mental Imagery
,
19
,
153
167
.
Miller
,
B. T.
, &
D'Esposito
,
M.
(
2005
).
Searching for ‘the top’ in top–down control
.
Neuron
,
48
,
535
538
. ,
[PubMed]
Ngiam
,
W. X. Q.
,
Adam
,
K. C. S.
,
Quirk
,
C.
,
Vogel
,
E. K.
, &
Awh
,
E.
(
2021
).
Estimating the statistical power to detect set size effects in contralateral delay activity
.
Psychophysiology
,
58
,
e13791
. ,
[PubMed]
Pearson
,
J.
(
2019
).
The human imagination: The cognitive neuroscience of visual mental imagery
.
Nature Reviews Neuroscience
,
20
,
624
634
. ,
[PubMed]
Pearson
,
J.
,
Clifford
,
C.
, &
Tong
,
F.
(
2008
).
The functional impact of mental imagery on conscious perception
.
Current Biology
,
18
,
982
986
. ,
[PubMed]
Pearson
,
J.
, &
Keogh
,
R.
(
2019
).
Redefining visual working memory: A cognitive-strategy, brain-region approach
.
Current Directions in Psychological Science
,
28
,
266
273
.
Pearson
,
J.
,
Rademaker
,
R.
, &
Tong
,
F.
(
2011
).
Evaluating the mind's eye: The metacognition of visual imagery
.
Psychological Science
,
22
,
1535
1542
. ,
[PubMed]
Pion-Tonachini
,
L.
,
Kreutz-Delgado
,
K.
, &
Makeig
,
S.
(
2019
).
ICLabel: An automated electroencephalographic independent component classifier, dataset, and website
.
Neuroimage
,
198
,
181
197
. ,
[PubMed]
Pounder
,
Z.
,
Jacob
,
J.
,
Evans
,
S.
,
Loveday
,
C.
,
Eardley
,
A. F.
, &
Silvanto
,
J.
(
2022
).
Only minimal differences between individuals with congenital aphantasia and those with typical imagery on neuropsychological tasks that involve imagery
.
Cortex
,
148
,
180
192
. ,
[PubMed]
Rademaker
,
R. L.
, &
Pearson
,
J.
(
2012
).
Training visual imagery: Improvements of metacognition, but not imagery strength
.
Frontiers in Psychology
,
3
,
224
. ,
[PubMed]
Riley
,
S. N.
, &
Davies
,
J.
(
2023
).
Vividness as the similarity between generated imagery and an internal model
.
Brain and Cognition
,
169
,
105988
. ,
[PubMed]
Serences
,
J. T.
(
2016
).
Neural mechanisms of information storage in visual short-term memory
.
Vision Research
,
128
,
53
67
. ,
[PubMed]
Slinn
,
C.
,
Nikodemova
,
Z.
,
Rosinski
,
A.
, &
Dijkstra
,
N.
(
2023
).
Vividness of visual imagery predicts performance on a visual working memory task when an imagery strategy is encouraged
.
PsyArXiv
.
Spagna
,
A.
,
Hajhajate
,
D.
,
Liu
,
J.
, &
Bartolomeo
,
P.
(
2021
).
Visual mental imagery engages the left fusiform gyrus, but not the early visual cortex: A meta-analysis of neuroimaging evidence
.
Neuroscience & Biobehavioral Reviews
,
122
,
201
217
. ,
[PubMed]
Sreenivasan
,
K. K.
,
Curtis
,
C. E.
, &
D'Esposito
,
M.
(
2014
).
Revisiting the role of persistent neural activity during working memory
.
Trends in Cognitive Sciences
,
18
,
82
89
. ,
[PubMed]
Tong
,
F.
(
2013
).
Imagery and visual working memory: One and the same?
Trends in Cognitive Sciences
,
17
,
489
490
. ,
[PubMed]
Vogel
,
E. K.
, &
Machizawa
,
M. G.
(
2004
).
Neural activity predicts individual differences in visual working memory capacity
.
Nature
,
428
,
748
751
. ,
[PubMed]
Vogel
,
E. K.
,
McCollough
,
A. W.
, &
Machizawa
,
M. G.
(
2005
).
Neural measures reveal individual differences in controlling access to working memory
.
Nature
,
438
,
500
503
. ,
[PubMed]
Williams
,
J. R.
,
Robinson
,
M. M.
,
Schurgin
,
M. W.
,
Wixted
,
J. T.
, &
Brady
,
T. F.
(
2022
).
You cannot “count” how many items people remember in visual working memory: The importance of signal detection-based measures for understanding change detection performance
.
Journal of Experimental Psychology: Human Perception and Performance
,
48
,
1390
. ,
[PubMed]
Zhang
,
W.
, &
Luck
,
S. J.
(
2008
).
Discrete fixed-resolution representations in visual working memory
.
Nature
,
453
,
233
235
. ,
[PubMed]

Author notes

*

Joint senior authors.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.