Recent studies have demonstrated that the numerosity of visually presented dot arrays is represented in low-level visual cortex extremely early in latency. However, whether or not such an early neural signature reflects the perceptual representation of numerosity remains unknown. Alternatively, such a signature may indicate the raw sensory representation of the dot-array stimulus before becoming the perceived representation of numerosity. Here, we addressed this question by using the connectedness illusion, whereby arrays with pairwise connected dots are perceived to be less numerous compared with arrays containing isolated dots. Using EEG and fMRI in two independent experiments, we measured neural responses to dot-array stimuli comprising 16 or 32 dots, either isolated or pairwise connected. The effect of connectedness, which reflects the segmentation of the visual stimulus into perceptual units, was observed in the neural activity after 150 msec post stimulus onset in the EEG experiment and in area V3 in the fMRI experiment using a multivariate pattern analysis. In contrast, earlier neural activity before 100 msec and in area V2 was strictly modulated by numerosity regardless of connectedness, suggesting that this early activity reflects the sensory representation of a dot array before perceptual segmentation. Our findings thus demonstrate that the neural representation for numerosity in early visual cortex is not sufficient for visual number perception and suggest that the perceptual encoding of numerosity occurs at or after the segmentation process that takes place later in area V3.
Among the many properties of our rich visual environment, numerical magnitude is one of the crucial dimensions needed to achieve a basic description of the external environment. A growing amount of evidence suggests that humans (e.g., Dehaene, 2011; Gallistel & Gelman, 1992) and many other animal species (e.g., Piantadosi & Cantlon, 2017; Rugani, Vallortigara, Priftis, & Regolin, 2015; Agrillo, Dadda, Serena, & Bisazza, 2008; Pepperberg, 2006) are endowed with a primitive sense of number allowing a rapid estimation or comparison of approximate numerical magnitudes. In this context, approximate numerical magnitude could indeed be regarded as a fundamental perceptual dimension (Burr, Anobile, & Arrighi, 2018; Anobile, Cicchini, & Burr, 2016), processed by a dedicated and largely format- and modality-independent mechanism (Anobile, Arrighi, Togoli, & Burr, 2016; Cicchini, Anobile, & Burr, 2016; Arrighi, Togoli, & Burr, 2014). Arguments opposing to the existence of a visual sense of number have been advanced by other studies (e.g., Dakin, Tibber, Greenwood, Kingdom, & Morgan, 2011; Durgin, 2008), although a growing amount of evidence ruled out such alternative explanations (e.g., see Anobile, Cicchini, et al., 2016).
Despite the fundamental nature of numerosity processing, however, the key cortical mechanisms allowing the extraction and representation of numerical magnitudes remain unclear. A plethora of studies to date have implicated the parietal cortex as the core neural region involved in numerosity perception. Specifically, neurons selectively tuned to numerosity have been highlighted in single-cell recording studies (Viswanathan & Nieder, 2013; but see Chen & Verguts, 2013, arguing against the existence of number-selective responses). Likewise, tuning curves for numerosity and a topographic organization of numerosity-selective responses have been reported in humans using fMRI (Harvey, Klein, Petridou, & Dumoulin, 2013; Piazza, Izard, Pinel, Le Bihan, & Dehaene, 2004).
In contrast to these previous investigations, recent studies have increasingly highlighted the role of the early visual cortex for numerosity processing. Particularly, a number of studies have now demonstrated that numerical information is encoded even at extremely early latencies and across multiple stages comprising early visual areas (Fornaciai, Brannon, Woldorff, & Park, 2017; Fornaciai & Park, 2017b; Park, DeWind, Woldorff, & Brannon, 2016; Roggeman, Santens, Fias, & Verguts, 2011). For instance, previous ERP studies using dot arrays demonstrate that numerosity-sensitive neural activity emerges as early as 75 msec after stimulus onset (Park et al., 2016), originating from early visual areas such as V2 and V3, with possible contributions even from V1 (Fornaciai et al., 2017). These findings obtained with EEG seem to parallel other neuroimaging studies showing numerosity-related activation in early visual areas by means of fMRI. For example, Roggeman et al. (2011) identified a first stage of numerosity processing in the inferior occipital gyrus, and Cavdaroglu, Katz, and Knops (2015) found numerosity-sensitive activity in similar occipital regions. Taken together, these recent studies thus suggest that the representation of numerosity emerges earlier in the visual stream than previously thought; however, the specific nature of early cortical representation of numerosity remains unclear.
In particular, one crucial remaining question is whether such early activity represents a sufficient correlate of numerosity perception. In other words, does the early visual cortical activity (V1, V2, and V3 around 75 msec) represent the content of the subjective perceptual experience? To define the properties of a “sufficient” numerical representation, the characteristics of numerosity perception should be taken into account. For instance, it has been shown that numerosity perception is based on perceptual units defined by the topological properties of visual images (He, Zhou, Zhou, He, & Chen, 2015), which under specific circumstances could induce distortions in the perceived magnitude of a visual stimulus (Fornaciai, Cicchini, & Burr, 2016; Franconeri, Bemis, & Alvarez, 2009; He, Zhang, Zhou, & Chen, 2009). To be considered a sufficient correlate, a neural signature of numerical representation should then reflect such topological properties of visual images and the segmentation of elements into perceptual units. Alternatively, such an early neural signal may indicate the initial raw sensory representation of the visual stimulus before segmentation, thus still coded in a format only weakly related to the emerging percept.
In this study, we used EEG and fMRI (in two separate conditions), in combination with multivariate pattern analysis techniques, to address the role of early neurophysiological signals and early visual areas in numerosity representation. To do so, we exploited the connectedness illusion, which has been shown to highlight the perceptual segmentation processes leading to the representation of numerosity. Under this illusion, the numerical magnitude of an array of dots connected pairwise by task-irrelevant lines is robustly underestimated compared with an array with the same number of isolated dots (Franconeri et al., 2009; He et al., 2009). On the one hand, if a processing stage in the visual stream encodes the numerical magnitude of the perceptual units contained in a visual stimulus, the neural activity should be maximally sensitive to the connectedness of the dot arrays. In turn, the effect of connectedness would reflect the processes giving rise to the perceptual units underlying the perception of numerical magnitude. On the other hand, if a processing stage in the visual stream encodes the raw sensory representation of the stimulus, the neural activity should be maximally sensitive to the numerosity of the dot arrays regardless of their connectedness. Such an effect would reflect processing stages before the segmentation of visual images into perceptual units, which is not sufficient to explain the properties of numerosity perception observed at the phenomenological/behavioral level.
Twenty participants (13 women; age ranging from 18 to 27 years, mean ± SD age = 21.2 ± 1.8 years) took part in the EEG experiment. Another group of 20 participants (16 women; age ranging from 18 to 27 years, mean ± SD age = 20.75 ± 2.2 years) took part in the fMRI experiment, but one participant was excluded from analysis because of excessive movements inside the scanner (see Image Preprocessing below), leaving 19 participants in the fMRI experiment. All the participants signed a written informed consent before participating in the experiment and were rewarded for their time with monetary compensation ($10/hr in the EEG experiment; $15/hr in the fMRI experiment). All the participants included in the study were naive to the purpose of the experiment and had normal or corrected-to-normal vision. All the experimental procedures were approved by the University of Massachusetts institutional review board and were in line with the Declaration of Helsinki. Note that, although female participants represent most participants in our study, the basic perceptual processes addressed in our experiments are not expected to be influenced by participants' sex.
In both the EEG and fMRI experiments, visual stimuli were generated using the Psychophysics Toolbox (Kleiner et al., 2007; Brainard, 1997; Pelli, 1997) on MATLAB (Version r2013b; The Mathworks, Inc.). In the EEG experiment, stimuli were presented on a monitor screen running at 144 Hz, encompassing approximately 34° × 19° of visual angle from the viewing distance of 90 cm, and with a resolution of 1920 × 1080 pixels. In the fMRI experiment, stimuli were presented on an MRI-compatible monitor screen positioned behind the scanner (60 Hz, 1920 × 1080 resolution), made visible to the participants by means of a mirror located inside the scanner, and encompassing 28° × 16° from a visual distance of about 137 cm. In both experiments, stimuli were arrays of black dots presented on a gray background, which could be either pairwise connected by straight lines (henceforth named connected dots) or presented as isolated dots (henceforth named isolated dots). The dot arrays were generated off-line and systematically constructed to span approximately equal ranges on three orthogonal dimensions of numerosity, size, and spacing (as in Park et al., 2016; DeWind, Adams, Platt, & Brannon, 2015). Such stimulus design was used in this study for a systematic sampling of numerical and nonnumerical attributes of a dot array, although the comparison between numerical and nonnumerical effects on neural activity was not the primary focus of the current study (for those studies, see Park, 2018; Fornaciai & Park, 2017a; Park et al., 2016; DeWind et al., 2015). For more details about this stimuli construction scheme, see DeWind et al. (2015) and Park et al. (2016).
In the EEG and fMRI experiments (see Task and Procedure), the dot arrays were generated from one of eight combinations of parameters, comprising two levels of numerosity (16 and 32 dots) by two levels of size by two levels of spacing. In terms of the specific parameters, arrays of 16 dots were drawn with an individual dot area of 0.067 or 0.093 deg2 and orthogonally with a field area of 5.25 or 7.44 deg2. Arrays of 32 dots were drawn with an individual dot area of 0.046 or 0.067 deg2 and orthogonally with a field area of 7.44 and 10.44 deg2. Furthermore, dot arrays drawn from each of the eight parameter combinations were either pairwise connected or isolated (see Figure 1), raising the total to 16 unique conditions. All the stimuli were generated using the same off-line routine, which iteratively calculates the coordinates of each dot to keep the items separated by at least 0.75°, while calculating the possible connections between pairs of dots to avoid lines crossing each other or other dots. Then, during the experiment, lines could be either presented or not on the same configuration of dots to display a connected or an isolated dot array. This procedure ensured that, in both cases, the dot arrays were generated according to the same constraints, so keeping the low-level statistics of the stimuli (i.e., regularity) similar. The same set of stimuli was used for both the EEG and fMRI experiments.
In the psychophysical part of the study (which was identical in both the EEG and fMRI experiments), where participants judged which of two dot arrays contained more dots (see Task and Procedure), the dot arrays generated by the aforementioned parameters served as one of the two dot arrays (the “reference” stimulus) presented on each trial. The other dot array (the “probe” stimulus) varied widely in numerosity (8–28 or 16–56 dots, when paired with a 16- or 32-dot reference stimulus, respectively) and was generated to match the reference stimulus (1) in total dot area and in sparsity, (2) in individual dot area and in sparsity, (3) in total dot area and in field area, or (4) in individual dot area and in field area, with equal probability. In the analysis of behavioral data, such different types of probe stimuli were collapsed together.
Task and Procedure
The experiment took place in a quiet and dimly illuminated room. Participants sat in front of a monitor screen and were instructed to keep their gaze on a central fixation cross, which was briefly presented before the presentation of the first stimulus and during the SoA between consecutive stimuli. Dot arrays were presented serially at the center of the screen, for a duration of 150 msec and with a variable SoA drawn from a uniform distribution between 500 and 700 msec. The experiment mostly required the participants to passively view the stream of stimuli. However, to ensure that participants kept their attention on the screen, they were required to perform a simple color oddball detection task. Namely, at unpredictable times, the dot array was presented in red, and participants were instructed to press a button on a joypad as soon as possible when they detected a red stimulus. Each of the 16 unique stimulus types were presented 25 times in each block. Of the 400 stimuli presented in each block, 16 of them were presented as an oddball, which were separated by a number of standard trials randomly chosen between 9 and 19. No other instructions were given to the participants, and nothing about number or magnitude was mentioned since participant recruitment until when participants completed this EEG part of the study. The average RT in the color oddball task was 412 ± 48 msec, and the average hit rate was 98% ± 2.3%. Each participant completed six blocks of 400 trials.
The procedure of the fMRI experiment was largely identical to the EEG experiment. While lying down within the scanner and watching the screen by means of a mirror, participants were instructed to keep their gaze on the central fixation point and to pay attention to the stimuli. The experiment comprised six runs. During each run, each of the 16 unique stimulus types was presented six times. Ninety-six dot array stimuli with 48 null events (i.e., presentation of the fixation cross alone instead of a stimulus, making the trial indistinguishable from an SoA) were presented sequentially (duration = 150 msec) with an SoA of 2000 msec. Null events were limited to a maximum of two consecutive events. Thus, considering the null events, the effective SoA ranged between 2000 and 6000 msec. Participants performed a color oddball detection task as in the EEG experiment, responding to red stimuli by pressing a button on a response device. In each run, eight stimuli were presented as an oddball. Beside the oddball task, no other instructions were given to the participants, and nothing about number or magnitude was mentioned until the completion of the fMRI scan. The average RT in the color oddball task was 411 ± 52 msec, and the average hit rate was 98% ± 4.6%.
After either the EEG or fMRI experiment, participants performed a numerosity discrimination task, composed of four blocks of 128 trials. The primary purpose of this second experiment was to confirm previous results demonstrating underestimation of numerosity when dots are pairwise connected (Fornaciai et al., 2016; Franconeri et al., 2009; He et al., 2009), but with our unique set of dot array stimuli. Participants were instructed to compare two dot arrays, one comprising 16 or 32 either pairwise connected or isolated dots (reference stimulus), with a probe stimulus comprising a variable number of unconnected dots (randomly chosen on each trial, ranging from −0.30 to +0.24 logarithmic units of difference with respect to the reference numerosity; note that we used an asymmetric range with larger values at the lower end according to the expected underestimation of connected stimuli). Probe and reference stimuli were presented simultaneously for 150 msec on the right and left of a central fixation point (with their position randomly chosen on each trial). Participants were asked to discriminate which one contained more dots by pressing the appropriate key on a standard keyboard. There were no time limitations for the response, and after a keypress, the subsequent trial was initiated with a random intertrial interval ranging from 400 to 500 msec. All different combinations of numerosity and connectedness were randomly intermixed within each block and were presented with equal probability.
Behavioral Data Analysis
EEG Electrophysiological Recording and Analysis
EEG was continuously recorded for the entire duration of the EEG experiment (actiCHAmp, Brain Products, GmbH), using a 64-channel, extended-coverage, triangulated equidistance cap (M10, EasyCap, GmbH), with a sampling rate of 1000 Hz. During the recording, all the channels were referenced to the vertex (Cz). To monitor artifacts due to blinks or eye movements, the EOG was monitored by means of electrodes positioned below the left eye and lateral to the left and right canthi. Channel impedances were usually kept below 15 kΩ. However, on some occasions, impedances up to 35 kΩ were tolerated.
The EEG data were analyzed off-line in MATLAB (Version R2013b), using the functions provided by the EEGLAB software package (Delorme & Makeig, 2004) and the ERPLAB toolbox (Lopez-Calderon & Luck, 2014). During the off-line analysis, the EEG signals were high-pass filtered (0.1 Hz) and were rereferenced to the average value of all the 64 channels. The continuous EEG data were then segmented into epochs from 100 msec before to 400 msec after stimulus onset, with a baseline correction using the prestimulus interval. We then excluded trials containing eye-blink artifacts by applying the step-like artifact rejection tool provided by EEGLAB. Trials were rejected whenever activity from the eye channels exceeded a threshold equal to 30 μV (in a time window spanning 400 msec, with 20-msec steps). This procedure led to an average (± SD) rejection rate of 7.97% (± 6.40%). Finally, the epochs were selectively averaged for each of the 16 stimulus types, followed by a low-pass filter (30 Hz) before computing the grand average.
On the basis of previous results showing magnitude-sensitive ERP peaks (Fornaciai & Park, 2017a; Park et al., 2016), our primary analysis was focused on a specific channel of interest and a specific latency window. In particular, in line with those previous studies, we chose the occipital channel Oz and a latency window spanning from 70 to 100 msec after stimulus onset, which encompasses the numerosity-sensitive peaks found in previous studies (75 msec in Park et al., 2016, and 88 msec in Fornaciai & Park, 2017a). Visual evoked potentials corresponding to the different class of stimuli, collapsed according to numerosity and connectedness (i.e., 16 isolated, 32 isolated, 16 connected, 32 connected), in that channel and time window were averaged. The distribution of responses across the group of participants was then tested using a two-way repeated-measures ANOVA with factors Numerosity (16 vs. 32) and Connectedness (isolated vs. connected).
To assess the effects of numerosity, connectedness, and the interaction between the two outside the aforementioned predefined ROI, a regression model was run on the entire epoch using a moving window approach. ERPs corresponding to each stimulus type were entered as responses, with their amplitude averaged over shorter time windows (window width = 10 msec) spanning the entire duration of a trial (window step = 10 msec, from −100 to 400 msec). The analysis was run individually for each participant to allow testing for the significance of the distribution of beta values obtained for the different regressors. To do so, we used a cluster-based nonparametric test (Maris & Oostenveld, 2007), comparing the clusters of significant time windows emerging from the actual data with the extent of significant time windows emerging from a null distribution of beta estimates (height threshold for significance corresponding to p < .001). The null distribution was computed by randomly permuting the design matrix and repeating this procedure for 10,000 iterations. The final estimate of the beta values at the group level was computed as the average of the individual beta estimates corresponding to different regressors (βConnectedness, βNumerosity, and βInteraction).
Multivariate Pattern Analysis in the Time Domain
To achieve a better description of the temporal dynamics of the neural activity pattern, we applied a multivariate pattern analysis method (King & Dehaene, 2014), using the Neural Decoding Toolbox (Meyers, 2013). This method allowed us to evaluate how neural activity patterns coming from multiple sensors differ between different experimental conditions and how well such differences can be generalized across the time domain. More specifically, this neural decoding analysis involved the training of a support vector machine (SVM) classifier on a subset of the data corresponding to specific conditions (i.e., the presentation of a specific stimulus type) and to make predictions about which stimulus was presented in the remaining subset of data and across all the time course of activity. All the different stimuli presented across the experiment were collapsed in four categories (16 isolated, 32 isolated, 16 connected, and 32 connected), and we compared pairs of them to assess the degree to which neural activity patterns for different numerosity (i.e., 16 vs. 32 isolated and 16 vs. 32 connected) or connectedness (i.e., 16 isolated vs. 16 connected and 32 isolated vs. 32 connected) can be distinguished. Each comparison was tested separately, training the SVM classifier with the responses (i.e., activity recorded at all the channels with the exception of the EOG channels) to the two classes of stimuli at hand and testing it on another subset of trials not used in the training phase (using a leave-one-trial-out cross-validation). We followed a practice suggested in Grootswagers, Wardle, and Carlson (2017) to account for high noise in single-trial EEG data. First, we created pseudo-trials by averaging randomly selected groups of 16 trials to improve signal-to-noise ratio. Second, to avoid overfitting, the number of features (i.e., channels) included in the analysis was limited to the five most significant features computed from a univariate ANOVA. Third, this decoding procedure was repeated 40 times for each participant using different subsets of data for training and testing (as well as different ways to generate pseudo-trials), and the average of the 40 runs was taken as the final estimate of the decoding performance. The outcome of the decoding procedure was a temporal generalization plot showing the performance of the classifier at each time point, with classification accuracy (CA) reflecting how well the pattern classifier can discriminate two conditions (see Figure 5). Note that the procedure and SVM hyperparameters of the decoding analysis were not determined based on the current data but based on the procedures employed in a previous study from our group (Fornaciai & Park, 2018) and on the practices suggested by Grootswagers et al. (2017).
fMRI Data Acquisition and Analysis
In the fMRI experiment, brain images were recorded using a Siemens Magnetom Skyra 3-T MRI scanner housed in the Human Magnetic Resonance Center at the University of Massachusetts Amherst. The functional images comprised 34 axial slices and were acquired with an EPI pulse sequence measuring the BOLD T2* contrast. The following scanning parameters were used: repetition time (TR) = 2000 msec, echo time = 30 msec, slice thickness = 3 mm, field of view = 204 mm, acquisition matrix = 68 × 68, flip angle = 79°, and parallel acceleration = 2. After the first four volumes in each of the six functional runs were discarded to allow scanner equilibrium, 144 volumes were acquired in each run. High-quality T1-weighted structural images were acquired after half of the functional runs, allowing the participant to take a break from the task. Structural images were acquired with a magnetization prepared rapid gradient echo sequence using the following parameters: TR = 1800 msec, echo time = 2.13 msec, slice thickness = 1 mm, field of view = 256 mm, flip angle = 9°.
Functional and structural data were preprocessed and analyzed using SPM8 (www.fil.ion.ucl.ac.uk/spm/) on MATLAB (R2015b). First, functional images were slice-time corrected and realigned to the first volume acquired in the first run, and the structural scan was coregistered to the mean functional image of the time series. The structural image was then segmented into white and gray matter, and the gray matter was normalized into the standardized Montreal Neurological Institute space. The normalization parameters were applied to the realigned functional images, with a spatial resolution of 3 mm × 3 mm × 3 mm. Finally, normalized functional images were smoothed with a Gaussian kernel (FWHM = 6 mm). If not indicated otherwise, SPM8 default parameters were used. After preprocessing, motion parameters were closely inspected to exclude participants showing excessive motion within the scanner. The exclusion criterion for excessive motion was based on scan-to-scan displacement. The displacement threshold was set to motion >1.5 mm/TR (half voxel across two successive TRs) evident in more than 10% of the scans during a run (14/144) and in more than one run. Only one participant exceeded such a criterion and was excluded from data analysis.
To focus the analysis on a subset of brain areas relevant for our experimental question, we limited our primary analysis to four visual ROIs: V1, V2, V3, and intraparietal sulcus (IPS). These visual ROIs were defined using the probabilistic atlas provided by Wang, Mruczek, Arcaro, and Kastner (2015), with a probability cutoff of 33%. Each of the early visual ROIs collapsed all four quadrants (dorsal/ventral and left/right). The IPS ROI collapsed six subdivisions (IPS0–IPS5) in both hemispheres in the original atlas. To avoid including the same voxels in multiple ROIs, we assigned overlapping voxels to the ROIs where they show the highest probability. The four visual ROIs are depicted in Figure 6B.
To achieve a measure of neural activity corresponding to the different class of stimuli presented during the experiment, activation was evaluated with a general linear model (GLM), including separate regressors for each of the 16 stimulus classes, which were convolved with a canonical hemodynamic response function. In addition, motion parameters (head translation and rotation), 1-TR history of the motion parameters, and the square of both of them were added as nuisance covariates in the GLM. Neural response was evaluated by taking the mean parameter estimate values (for specific contrasts) across voxels within each ROI, for each participant. The distribution of average parameter estimates value was then tested using a 2 × 2 ANOVA full factorial design for each of the ROIs, comprising the two levels of numerosity (16 and 32 dots) and the two levels of connectedness (isolated and connected dots).
Multivoxel Pattern Analysis
As in the EEG analysis, we employed a multivariate approach (multivoxel pattern analysis [MVPA]) to evaluate the discriminability of brain activity patterns across different classes of stimuli. We used the SVM classifier provided by the libsvm package on MATLAB (R2015b; Chang & Lin, 2011). The MVPA involved training the classifier on the pattern of voxel activity corresponding to different classes of stimuli (i.e., 16 vs. 32 isolated; 16 isolated vs. 16 connected) and then testing whether the classifier can successfully predict which stimulus corresponds to novel instances of voxel activity not used for training. More specifically, each instance of a pattern for a specific stimulus class was defined by the corresponding activity in each run. This approach thus resulted in six instances for each stimulus class in each participant, as there were six runs in the experiment. In addition, to avoid overfitting, we only included in the analysis 10% of the voxels, based on their discriminative power (using a univariate t test) tested before entering the data into the MVPA routine. A leave-two-out cross-validation procedure was employed, training the classifier (using the cost parameter, C = 1) on 10 instances corresponding to two classes of stimuli (five instances each) and then testing it on the two remaining instances. This procedure was run individually for each participant and was repeated six times to cover different combinations of training and test sets. The average across all repetitions was taken as the final estimate of classification accuracy (CA) for each participant and for each comparison, indicating how well the pattern of voxel activity within each ROI allowed the classifier to successfully predict which stimulus corresponded to the instances included in the test set. The distribution of classification accuracies across the group was then tested with one-sample t tests against the null hypothesis of chance level CA (i.e., 0.5).
To confirm that pairwise connections between dots can successfully change perceived numerosity as reported in previous studies (Fornaciai et al., 2016; Franconeri et al., 2009; He et al., 2009), we analyzed the magnitude of the effect induced by connectedness, measured with a psychophysical test performed by participants after completing either the EEG or fMRI experiment. In this behavioral paradigm, participants were asked to discriminate the numerosity of two simultaneously presented dot arrays: a reference containing either 16 or 32 dots, either isolated or pairwise connected, and a probe containing a variable number of dots (Figure 1B). Note that, besides the different levels of numerosity, other nonnumerical continuous magnitudes were also systematically manipulated, which can be summarized by the orthogonal dimensions of size and spacing (see Methods; Park et al., 2016; DeWind et al., 2015). Figure 2A shows the average psychometric curves obtained by pooling the data of the two groups of participants tested in the EEG and fMRI experiments. Curves describing performances with connected dots were strongly shifted toward the left, indicating a robust underestimation compared with arrays of isolated dots. Figure 2B depicts the difference in the PSE induced by connectedness across all participants. The perceived numerosity of connected items was substantially lower compared with isolated items in both conditions (16-dot reference: numerosity change = −19.2% ± 2.1%; Wilcoxon signed rank test, Z = −5.39, p < .001; 32-dot reference: numerosity change = −25.5% ± 2.2%; Z = −5.44, p < .001).
Temporal Dynamics: Regression Analysis of EEG Data
In the EEG experiment, participants watched a stream of dot-array stimuli comprising different numerosities (16 or 32 dots), either isolated or pairwise connected, and performed an oddball detection task on a dimension (i.e., color) irrelevant to the dimension of interest (i.e., numerosity). Figure 3A shows the brainwaves over the midline occipital channel Oz, our primary channel of interest given previous findings (Fornaciai et al., 2017; Park et al., 2016), for the four stimulus categories. As can be seen from the brainwaves, the ERPs evoked by different absolute numbers of dots (16 vs. 32) were dissociable early in the time course (<100 msec) regardless of whether the stimuli were connected or not. In stark contrast, at later latencies (100–200 msec), the ERPs differed more based on whether the dots were isolated or connected than on the absolute number of dots.
To address our central research question concerning the extremely early visual cortical activity, we assessed the effects of numerosity and connectedness using a two-way repeated-measures ANOVA on the ERPs at channel Oz from the latency window of 70–100 msec, again defined based on previous findings (Fornaciai et al., 2017; Park et al., 2016; for details, see ROIs in Methods). Specifically, all the stimuli presented during the experiment were collapsed into four classes (i.e., 16 isolated, 32 isolated, 16 connected, and 32 connected), and ERPs corresponding to these different classes of stimuli were averaged across a target latency window defined to include the peaks of early numerosity-sensitive activity found in previous studies (Fornaciai et al., 2017; Park et al., 2016). The results showed a significant main effect of Numerosity, F(1, 19) = 23.04, p < .001, but no main effect of Connectedness, F(1, 19) = 1.26, p = .28, or an interaction between the two, F(1, 19) = 1.89, p = .18. They demonstrate that early cortical activity is exclusively modulated by the numerosity of the stimuli, with very little, if any, effect of the connectedness illusion.
We then performed a point-by-point regression analysis to provide a more comprehensive picture of the observed effect across the entire epoch (Figure 3B). This model included numerosity (16 or 32 dots), connectedness (0 or 1 categorical coding), and the interaction between the two as regressors to explain the variance in the ERPs. The distribution of beta values obtained with the regression analysis was tested by means of a cluster-based nonparametric test, performed across the entire time course of activity. At channel Oz, a significant effect of Numerosity emerged early in the time course, peaking at around 105 msec poststimulus (75–125 msec, peak βN = −1.21; p < .0001, average adjusted R2 = .51 ± .28)—consistently with the significant effect found within our primary latency window of interest. The effect of Numerosity was followed by a significant effect of Connectedness (115–175 msec, peak βC = −2.50; p < .0001, average adjusted R2 = .81 ± .10) and a significant interaction (135–165 msec, peak βI = −0.91; p = .0001, average adjusted R2 = .81 ± .10), both peaking at around 150 msec. Interestingly, the significant interaction at around 150 msec shows that, at this stage, neural responses are not explained by the numerical magnitude of the stimuli or by the presence/absence of connecting lines alone but are mostly determined by the combination of numerosity and connectedness. In addition, a later positive peak of the Connectedness effect was evident at about 205 msec poststimulus (195–215 msec, peak βC = 0.97; p = .0004, average adjusted R2 = .43 ± .24), followed by a later peak of the Numerosity effect around 245 msec (245–256 msec, peak βN = −0.67; p < .0001, average adjusted R2 = .47 ± .25). These results show that modulations of numerosity and connectedness both have a strong effect on neural responses, although with different timing. Channels nearby to Oz such as O1′ and O2′ (figure not shown) similarly showed a significant early effect of Numerosity (peak at 105 msec; p < .001), followed by the effect of Connectedness and the interaction between Numerosity and Connectedness at about 155 msec (p < .001 in all cases). Moreover, similarly to Oz, a later positive peak of the connectedness effect was evident at about 205 msec poststimulus at both channels O1′ (p < .001) and O2′ (p = .008). This demonstrates that the pattern of results highlighted at Oz is not limited to this specific channel but extends to nearby channels as well (as also indicated in Figure 4).
To obtain a comprehensive view of the observed effects across the channels, we plotted topographic maps of beta estimates. As shown in Figure 4A, the effect of numerosity showed two main peaks: one around 75–125 msec over medial occipital channels and another around 150–175 msec over occipito-parietal scalp sites. This pattern is consistent with earlier findings demonstrating two separate peaks of numerosity-sensitive activity at about 75 and 180 msec poststimulus (Park et al., 2016). In contrast, the effect of connectedness (Figure 4B) was mostly evident around 125–175 msec over medial occipital scalp sites. Finally, the topographic distribution of the interaction effect was much like that of the connectedness effect (Figure 4C). It should be noted that top-perspective topographic maps (not shown in figures) show no noteworthy effect of numerosity, connectedness, or interaction, confirming that the effects of experimental manipulations are maximally distributed in occipital regions.
Temporal Generalization Analysis in the Time Domain
A multivariate neural decoding analysis was performed to characterize the dynamics of neural representations across all the recording channels (see Methods). In this decoding analysis, an SVM classifier was used to test how well two classes of neural activity patterns are distinguishable (e.g., 16 vs. 32 isolated or 16 isolated vs. 16 connected), using CA as a metric for that distinctiveness. The comparison of 16 versus 32 isolated conditions together with the comparison of 16 and 32 connected conditions allowed us to infer about how the neural activity patterns for two different numerosities differed. The comparison of 16 isolated versus 16 connected conditions together with the comparison of 32 isolated and 32 connected conditions allowed us to infer about how the neural activity patterns for isolated and connected items differed.
Figure 5A shows the pattern of CA obtained by training and testing the classifier at the same time point. In the case of the numerosity contrast (i.e., different number of dots regardless of their connectedness), CA showed a first peak at about 95 msec (average peak accuracy = 0.64 ± 0.067), consistent with the results of the regression analysis, followed by a sustained stream of above-chance CA until later in the time course. In contrast, the neural decoding pattern corresponding to the connectedness effect (i.e., same number of isolated vs. connected dots) peaked slightly later in the time course at 150 msec (average peak accuracy = 0.71 ± 0.09), again consistent with the results of the regression analysis. The CA was generally greater in distinguishing the effect of connectedness than in distinguishing the effect of numerosity, suggesting that the processes triggered by the connectedness illusion are much more robust compared with the raw numerical representation.
To assess the extent to which patterns of activity generalize across multiple time windows, we also obtained temporal generalization plots by training the classifier at one time point and testing it across all the time points (Figure 5B). Whereas training and testing the classifier at the same time point (which represents the diagonals of the temporal generalization plots shown in Figure 5B) provides information about the dynamics of brain activity patterns across the analyzed time course, the temporal generalization provides further information about potential recurring patterns and sustained stages of activation over time (King & Dehaene, 2014). The temporal generalization plots (Figure 5B) first confirmed the stronger effect in the connectedness comparison. One additional noticeable point here is that, at earlier latencies, above-chance decoding was mostly distributed along the diagonal in both the numerosity and connectedness cases, suggesting that these dot-array stimuli are likely represented by a series of processing stages with unique patterns of activity through the visual stream. On the other hand, the pattern of CA showed a larger generalization across time points later in the time course (∼300 msec), likely representing the involvement of a relatively sustained processing stage. Note that the effects represented in the diagonal of a temporal generalization plot in Figure 5B reflect the decoding results shown in Figure 5A.
Anatomical Correlates: fMRI Results
Overall, the EEG results showed that the early-latency ERPs are exclusively modulated by numerosity irrespective of connectedness, suggesting that the segmentation processes necessary to turn the raw visual input into perceptual units emerge only later in the processing stream. Indeed, as shown by both the regression and decoding analyses, the effect of connectedness emerged at later latencies (150 msec), well beyond the early numerosity-sensitive signals highlighted in previous studies (Fornaciai et al., 2017; Park et al., 2016). Previous results (Fornaciai et al., 2017) pinpointed such early activity as arising from early visual areas such as V2 or V3. To identify the roles of different visual areas in representing the raw numerical information versus the perceptual units necessary for numerosity perception, we further investigated the anatomical correlates of the connectedness illusion. To this aim, we performed an independent fMRI experiment using the same paradigm and the same set of stimuli as in the EEG experiment to address this question.
In the fMRI experiment, participants performed the same task viewing the same dot-array stimuli as in the EEG experiment. The only difference was the longer SoA ranges that accommodated the sluggish hemodynamic response (see Methods). Participants' BOLD signal associated with each of the four experimental conditions (i.e., 16 isolated, 16 connected, 32 isolated, and 32 connected) was modeled using the GLM, and the regression (beta) coefficients were interpreted as a proxy for neural activity. We then assessed the effects of numerosity and connectedness in anatomically defined early visual ROIs including areas V1, V2, V3, and the IPS (Figure 6B). These ROIs were chosen given the medial occipital effects obtained in the current EEG experiment (Figure 4) and the recent evidence for numerosity processing primarily in V2 and V3 (Fornaciai et al., 2017). Besides these primary ROIs, however, we also examined the effects in the IPS for its relevance in numerosity processing (Harvey et al., 2013; Piazza, Pinel, Le Bihan, & Dehaene, 2007; Piazza et al., 2004).
Neural responses representing the encoding of numerosity were obtained by extracting the average regression coefficients (from the GLM in fMRI analysis) for different levels of numerosity (i.e., 16 and 32 dots) and for different levels of connectedness (i.e., isolated and connected), within each of the selected ROIs (V1, V2, V3, and IPS). We then assessed the neural responses to the different class of stimuli using a 2 × 2 repeated-measures ANOVA, separately for each ROI. However, none of the ROIs showed any significant effect of either Numerosity, F(1, 18) < 1.26, ps > .27, or Connectedness, F(1, 18) < 3.89, ps > .064, and no interaction between the two factors, F(1, 18) < 2.61, ps > .12. These weak effects suggest that the average activity across the different ROIs might not be sensitive enough to pick up a relatively small effect caused by the parametric modulation of our dot-array stimuli.
As in the EEG experiment, we adopted a multivariate approach to achieve better sensitivity to the brain activity elicited by different experimental conditions. Indeed, MVPA has already been successfully employed in previous fMRI studies to investigate numerical processing (e.g., Castaldi, Aagten-Murphy, Tosetti, Burr, & Morrone, 2016; Bulthé, De Smedt, & Op de Beeck, 2014; Eger et al., 2009) and is thought to provide a much more sensitive index of how brain activity is modulated by different experimental manipulations (Davis et al., 2014). In this MVPA, an SVM classifier was used to assess the discriminability of two classes of brain response patterns within each ROI. The discriminability of neural activity patterns based on numerosity (i.e., 32 vs. 16 isolated combined with 32 vs. 16 connected) was significantly above chance in V2 and V3 (Figure 6A, left; CA = 0.58 ± 0.02, and 0.59 ± 0.02; one-sample t test: t(18) = 3.62, p = .002, and t(18) = 3.70, p = .0016, respectively, for V2 and V3), indicating that the pattern of activation across voxels within these areas can successfully discriminate between different absolute numbers of dots, regardless of their connectedness. The discriminability of neural activity pattern based on the effect of connectedness (i.e., 16 isolated vs. 16 connected combined with 32 isolated vs. 32 connected) was significantly above chance in V3 only (Figure 6A, right; CA = 0.57 ± 0.02, t(18) = 2.87, p = .01). Except for these cases, other ROIs including the IPS failed to discriminate neural activity patterns for numerosity or connectedness (ps > .11).
Numerical magnitude is one of the fundamental attributes of the visual world, yet a comprehensive account of the neural pathway leading to the representation of numerosity is still lacking. Recent studies demonstrate robust neurophysiological correlates of numerosity perception, with numerosity-sensitive brain signal emerging at multiple stages during the time course of activity. For instance, earlier studies have shown a robust neural signature of numerical processing emerging from brain signals, evoked by dot-array stimuli (e.g., Hyde & Spelke, 2009; Libertus, Woldorff, & Brannon, 2007; Temple & Posner, 1998), and particularly at the level of the P2p component—an ERP component consistently associated with approximate numerical processing. More recently, it has been shown that numerosity-sensitive activity can also be observed much earlier after stimulus onset, as early as 75 msec (Park et al., 2016). Later studies identified this activity as arising from early visual areas, such as V2 and V3, with possible contributions even from V1 (Fornaciai et al., 2017). However, it has been demonstrated that numerosity perception is not simply based on the physical amount of items present in the visual scene but rather on segmentation processes defining a set of “perceptual units” according to the topological properties of the visual images (i.e., as in the case of items connected by lines or items enclosed within the same boundary; Fornaciai et al., 2016; He et al., 2009, 2015; Franconeri et al., 2009). It is thus unclear whether the extremely early activity highlighted in previous studies actually reflects such perceptual units forming the basis of numerosity perception or a raw sensory representation of the visual stimuli before segmentation. Indeed, only in the former case a neural signature could be regarded as a sufficient correlate of numerosity perception.
To investigate whether early neural activity for numerosity in the low-level visual cortex represents a sufficient correlate of numerosity perception, we exploited the connectedness illusion (Fornaciai et al., 2016; He et al., 2009, 2015; Franconeri et al., 2009), which is based on one of the most basic principles of perceptual organization (i.e., uniform connectedness; Palmer & Rock, 1994; Wertheimer, 1912). Numerosity perception and its connectedness illusion provide an ideal condition to address this issue, because there is a clear distinction between raw sensory information and the perceptual units giving rise to the connectedness illusion and because the effect of this illusion can be manipulated parametrically. A first question, however, is “Does an illusion like the one provided by connectedness on numerical estimates occur automatically even in a passive viewing paradigm as the one employed in the current study?” Although we do not have direct evidence for this, segmentation due to connectedness is one of the very basic processes of low-level vision, organizing the visual input through a series of self-generated inferences about the nature of external objects (for a review, see Frégnac & Bathellier, 2015). By considering this, it is very likely that the connectedness illusion would arise automatically and independently of a task, as also suggested by the unavoidable nature of this illusion in a numerosity discrimination task (i.e., participants' numerical estimates are biased by connectedness even if instructed to ignore the lines as much as they can; Franconeri et al., 2009).
The results of our EEG experiment from both the regression (Figures 3 and 4) and multivariate decoding (Figure 5) analyses show a striking dissociation between neural activity reflecting raw numerosity (e.g., illustrated in the contrast of 16 vs. 32 isolated dots) and the neural activity reflecting the connectedness illusion (e.g., illustrated by the contrast of 16 isolated vs. 16 connected dots). The neural activity pattern representing numerosity emerges early in the visual stream (∼100 msec) followed by an effect of connectedness at a later latency (∼150 msec), with largely overlapping topographic distributions. The results from our fMRI experiment further demonstrate that, whereas numerosity is represented in both visual areas V2 and V3, the effect of connectedness emerges only in V3. Thus, collectively, our results suggest that early visual signals (<100 msec), arising from areas V2 and V3, do not represent a sufficient correlate of numerosity perception. Instead, segmentation processes creating the perceptual units necessary for numerosity perception, as highlighted by the connectedness illusion, emerge only later in the visual stream, around or after 150 msec post stimulus onset. Concerning the fMRI results, it is interesting to note the large difference in sensitivity of univariate and multivariate analysis procedures employed in our analysis. The better performance of the multivariate analysis is however not surprising, as this technique provides the possibility to exploit voxel level variability and discard participant level variability (Davis et al., 2014), providing a more sensitive tool to detect differences between experimental manipulations resulting in potentially too small differences in the overall level of activity.
One plausible explanation for this pattern of results is that numerosity information first processed in V2/V3 in a feedforward manner (at or before 100 msec) gets processed again within that same cortical region (V3) after reentrant feedback from other areas (around 150 msec). This idea is consistent with previous studies highlighting the role of reentrant signals for processes like figure-ground segmentation (Supèr & Romeo, 2011; Fahrenfort, Scholte, & Lamme, 2007; Lamme, Zipser, & Spekreijse, 2002). In particular, Fahrenfort et al. (2007) highlighted at least three stages of brain processing related to figure-ground segmentation: a first, purely feedforward stage occurring at latencies before 110 msec, followed by a processing stage characterized by feedback signals coming back to early visual areas after 110 msec. The timing highlighted by Fahrenfort et al. (2007) closely matches the timing of our numerosity and connectedness effects, thus supporting the idea that feedback signals are necessary to segment the image into perceptual units setting the bases for numerosity representation. What is the origin of these reentrant signals? One possibility is that feedback signals arise from higher-level areas such as the parietal or frontal cortex. However, given the tight timing between the two processing stages, a more plausible explanation is that the reentrant feedback arises from the interaction between midlevel visual areas along the dorsal (i.e., V3A, MT) and ventral (i.e., V4, LOC) stream, which process different visual features. This idea is consistent with recent behavioral observations suggesting the involvement of multiple visual areas in the early levels of numerosity processing (Fornaciai & Park, 2017b) and with the fact that visual area V3 represents a crucial node in the visual pathway, with the peculiar features of neurons in this area (i.e., sensitivity to multiple features of the stimuli like color and motion), making it an intersection between the dorsal and ventral stream (Gegenfurtner, Kiper, & Levitt, 1997). Thus, according to this idea, the reentrant feedback signals from a network of midlevel visual areas to V3 mediates the object segmentation processes providing the source of the perceptual units underlying numerosity perception. However, although this interpretation accounts very well for our results, a word of caution is in order when interpreting the activity in V3 observed with fMRI in light of the timing of the connectedness effect observed with EEG. The two experiments were performed separately on independent groups of participants, which makes it difficult to directly combine the results of the two experiments. Examining the role of feedforward and feedback processing in visual area V3 by combining different neuroimaging techniques is subject to future work.
How does this reentrant processing fit into the numerosity processing stream highlighted by previous studies? For instance, Park et al. (2016) demonstrated two distinct stages of numerosity perception, first around 75 msec after stimulus onset followed by a later stage around 185 msec, and Fornaciai et al. (2017) showed that the first of these two stages occurs in early visual areas such as V2 or V3. In this context, the effect of connectedness emerging at around 150 msec may lie on the information transformation pathway between the two numerosity processing stages observed in previous studies (i.e., 75–88 and 185–215 msec; Fornaciai et al., 2017; Park et al., 2016). Interestingly, although the later processing stage has been interpreted as a signature of activity in parietal cortex, consistently with results showing activity in the IPS (Harvey et al., 2013; Piazza et al., 2004), we did not observe any significant effect of numerosity or connectedness in the IPS. This lack of parietal activation in the context of our passive viewing paradigm is consistent with a recent study showing little or no IPS activation in the absence of an explicit task (DeWind, Park, Woldorff, & Brannon, 2018; Cavdaroglu et al., 2015) and raises the possibility that a decision process is necessary for the involvement of the IPS (also see Göbel, Johansen-Berg, Behrens, & Rushworth, 2004).
Although the current findings reject the hypothesis that early visual cortical activity represents a sufficient correlate of numerosity perception and rather illustrate that the neural signature for perceptual segmentation occurs only at or after 150 msec in area V3, does this mean that this later activity is sufficient to give rise to the subjective perceptual experience of numerosity? Unfortunately, our results alone cannot determine whether those representations are sufficient to give rise to subjective perceptual experience, because the passive viewing nature of our paradigm prevents us from assessing whether numerosity-sensitive neural responses at different levels are correlated with the actual numerosity experienced by participants. Further processing stages may be required to reach visual awareness. Nevertheless, considering previous studies addressing the neurophysiological correlates of visual awareness (see Koivisto & Revonsuo, 2010, for a review), there is evidence to argue that neural representations in V3 around 150 msec may form a sufficient basis to support the subjective conscious experience of numerosity. For instance, the most widely acknowledged ERP correlate of visual awareness emerging from studies comparing seen versus unseen stimuli is the visual awareness negativity, peaking between 150 and 200 msec over occipito-parietal sites (although the timing and scalp topography are highly variable across different studies; Pitts, Metzler, & Hillyard, 2014; Railo, Koivisto, & Revonsuo, 2011), and the timing of which is relatively consistent with the peak of the connectedness effect (150 msec). Similarly, studies investigating illusory percepts show the involvement of midlevel vision processing stages (145–155 msec) in representing illusory information such as illusory contours (Halgren, Mendola, Chong, & Dale, 2003), again consistent with the timing of the connectedness illusion observed in our results. However, because our results do not allow us to draw strong conclusions about whether signals in V3 may underlie the emergence of numerosity information into visual awareness, this remains an intriguing question for future studies. Regardless of this possibility, we can conclude that relatively early processing occurring in visual area V3 is the earliest processing stage where we can trace a sufficient correlate of numerosity perception, reflecting the perceptual units setting the bases for numerical perception.
Overall, we have demonstrated that numerosity perception undergoes at least two distinct stages: first, the extraction of raw sensory information of a dot-array stimulus early in the visual stream (<100 msec in V2/V3), which does not yet form a sufficient basis for numerosity perception and, second, the segmentation of the stimulus into perceptual units slightly later in the visual stream (150 msec in V3) that are likely to serve as a foundation for perceived representation of numerosity. These results thus highlight the active and constructive nature of perception, whereby numerical information passes through different processing stages in early visual cortex before being turned into a suitable format for numerosity perception.
We thank Brynn Boutin, Courtney McMahon, and Bri Southwick for their assistance in EEG data collection and Dr. Kwan-Jin Jung for his advice and assistance in fMRI data collection. We also thank Drs. Kyle Cave and Marty Woldorff for helpful comments and discussions. This study was supported by the National Science Foundation CAREER Award BCS1654089 to J. P.
Reprint requests should be sent to Michele Fornaciai, Department of Psychological and Brain Sciences, University of Massachusetts, 135 Hicks Way, Amherst, MA 01003, or via e-mail: firstname.lastname@example.org.