Objective: Incidental learning of spatiotemporal regularities and consistencies—also termed ‘statistical learning’—may be important for discovering the causal principles governing the world. We studied statistical learning of temporal structure simultaneously at two time-scales: the presentation of synthetic visual objects (3 s) and predictive temporal context (30 s) in the order of appearance of such objects.

Methods: Visual objects were complex and rotated in three dimensions about varying axes. Observers viewed fifteen (15) objects recurring many times each, intermixed with other objects that appeared only once, while whole-brain BOLD activity was recorded. Over three successive days, observers grew familiar with the recurring objects and reliably distinguished them from others. As reported elsewhere (Kakaei & Braun, 2024), representational similarity analysis (RSA) of multivariate BOLD activity revealed 124 ‘object-selective’ brain parcels with selectivity for recurring objects, located mostly in the ventral occipitotemporal cortex and the parietal cortex.

Main results: Here, we extend RSA to the representation of predictive temporal context, specifically “temporal communities” formed by objects that tended to follow each other. After controlling for temporal proximity, we observed 27 ‘community-sensitive’ brain parcels, in which pairwise distances between multivariate responses reflected community structure, either positively (smaller distances within than between communities) or negatively (larger distances within). Among object-selective parcels, 11 parcels were positively community-sensitive in the primary visual cortex (2 parcels), the ventral occipital, lingual, or fusiform cortex (8 parcels), and the inferior temporal cortex (1 parcel). Among non-object-selective parcels, 12 parcels were negatively community-sensitive in the superior, middle, and medial frontal cortex (6 parcels), the insula (2 parcels), the putamen (1 parcel), and in the superior temporal or parietal cortex (3 parcels).

Conclusion: We conclude that cortical representations of object shape and of predictive temporal context are largely coextensive along the ventral occipitotemporal cortex.

Even when sensory stimuli are experienced passively—without task or reward—they can modify the underlying neural pathways and alter subsequent sensory performance and behavior (e.g., Conway & Christiansen, 2005; Lengyel et al., 2019; Li & DiCarlo, 2012). This incidental and automatic type of plasticity had been termed ‘statistical learning’ or ‘implicit learning’ (for reviews, see Aslin, 2017; Fiser & Lengyel, 2022; Perruchet, 2019; Perruchet & Pacton, 2006; Saffran & Kirkham, 2018; A. Schapiro & Turk-Browne, 2015). Some theories of cognitive development hypothesize that incidental learning during everyday experience captures the causal processes and relationships underlying sensory observations at a more abstract level (Kemp & Tenenbaum, 2008; Tenenbaum et al., 2011). If so, statistical learning might contribute to higher cognitive function by acquiring the quality of a “structural learning” that could underpin learning from examples, generalizing between domains, or gaining causal insight and understanding (Lake et al., 2017; Shafto et al., 2011).

A well-studied instance of incidental learning is the view-invariance of visual object recognition (for reviews, see DiCarlo et al., 2012; Gauthier & Tarr, 2016; Logothetis & Sheinberg, 1996). Humans and non-human primates typically recognize visual objects from different viewing directions and distances, presumably relying on characteristic features and/or their spatial relationships. This perceptual invariance can be modified rapidly by the experience of contiguous sequences of different views, demonstrating dependence on learning (Tian & Grill-Spector, 2015; Wallis & Bülthoff, 2001; Wallis et al., 2009). The neural representation of visual shape in the ventral occipitotemporal cortex is similarly view-invariant and equally subject to modification by the recent experience of (natural or unnatural) sequences of views (Jia et al., 2021; Li & DiCarlo, 2008, 2010, 2012; Op de Beeck & Baker, 2010; Van Meel & Op de Beeck, 2018, 2020).

Incidental learning is not limited to individual objects but extends also to spatiotemporal configurations of multiple objects. When human observers experience temporal sequences or spatial arrays of visual objects, task-irrelevant statistical regularities and contingencies are learned rapidly (within minutes), as can be revealed by subsequent behavioral tests (Fiser & Aslin, 2001, 2002, 2005; Kakaei et al., 2021; Sáringer et al., 2022; Turk-Browne et al., 2005, 2009). In non-human primates, the experience of task-irrelevant temporal dependencies modifies object-specific responses of neurons in visual areas of the ventral temporal cortex, but also in multimodal areas of the medial temporal lobe (Erickson & Desimone, 1999; Kaposvari et al., 2018; Meyer et al., 2014; Miyashita, 1988; Sakai & Miyashita, 1991). In human observers, functional imaging evidence reveals that task-irrelevant temporal dependencies can modulate BOLD responses in visually selective areas of the ventral occipital cortex, as well as in multimodal areas such as the medial temporal lobe, hippocampus, and basal ganglia (Gheysen et al., 2011; Giorgio et al., 2018; Hindy et al., 2016; Hsieh et al., 2014; Karlaftis et al., 2019; Turk-Browne et al., 2009, 2010; A. C. Schapiro et al., 2012; R. Wang et al., 2017). Statistical learning goes beyond first-order dependencies (between immediate temporal neighbors) and extends to higher-order dependencies (between more distant neighbors). For example, A. C. Schapiro et al. (2013, 2016) demonstrated statistical learning of clusters of dependencies (“temporal communities”) and observed BOLD correlates of this predictive temporal context in associative areas of the frontal and temporal lobes and in the hippocampus, but not in visually selective areas of the ventral occipitotemporal cortex.

Here, we investigate statistical learning by human observers with temporal sequences of visual objects, seeking to compare neural correlates of learning at the levels of individual visual objects and of higher-order temporal dependencies. Unlike previous work, we focus on the visual pathways in the ventral occipitotemporal cortex, the major neural substrate of visual experience and long-term memory (reviewed by Bi et al., 2016; Grill-Spector & Weiner, 2014; Kravitz et al., 2013; Weiner & Zilles, 2016). We hypothesize that learning of visual shapes (i.e., spatiotemporal relationships of characteristic features) might interact with the learning of the context in which such shapes appear (i.e., spatiotemporal configurations of distinct shapes) (Miyashita, 1988).

Our visual stimuli were synthetic, three-dimensional objects of unique and characteristic shape that rotated slowly about varying axes. Over three successive sessions/days, observers viewed 15 ‘recurring’ objects approximately 200 times each, as well as 360 ‘non-recurring’ objects once each, while attempting to classify each presented object as either ‘familiar’ (recurring) or ‘novel’ (non-recurring). As reported previously (Kakaei & Braun, 2024; Kakaei et al., 2021), observers quickly gained familiarity with recurring objects and learned to recognize their characteristic shape from all points of view.

We recorded whole-brain BOLD activity during all three sessions/days and analyzed this activity in terms 758 functionally defined brain parcels (Dornas & Braun, 2018), which on average comprised approximately 200 voxels and 1.7cm3 of gray matter. ‘Representational similarity analysis’ (RSA; Haxby, 2012; Kriegeskorte et al., 2008) was used to quantify the information encoded by the BOLD activity of each brain parcel, specifically, the 9s of multivariate activity following object presentation. For every brain parcel, this analysis was carried out in a lower-dimensional subspace chosen to maximize the differences between BOLD responses (‘optimal subspace’; Yu & Yang, 2001). A cross-validated analysis identified 124 (of 758) brain parcels that were ‘identity-selective’ in the sense that object identity could be decoded from BOLD activity in the majority of observers: 90 parcels in the ventral occipitotemporal cortex, 28 parcels in the parietal cortex, and 6 parcels in frontal and other regions. The detailed results are reported in a companion study (Kakaei & Braun, 2024).

To investigate the effect of predictive temporal context, we manipulated the order in which objects were presented, adapting the paradigm of A. C. Schapiro et al. (2013). For three sessions/days, the sequence of recurring objects was generated such as to form ‘temporal communities’ in the sense that every object was likely to be followed by other objects from the same community (“structured condition”). For three further sessions/days, the presentation sequence was fully random so that every object was equally likely to be followed by any other object (“unstructured condition”).

To ascertain the effect of ‘temporal communities’ on multivariate BOLD activity, we analyzed pairwise distances in the ‘optimal subspace’ and established ‘community sensitivity’ in terms of the ratio of average response distances to objects in the same community and in different communities. After controlling for effects of temporal proximity, we established significant ‘community sensitivity’ in 27 of 758 cortical parcels. In particular, we observed positive community sensitivity (i.e., smaller distances within than between communities) in 11 ‘identity-selective’ parcels in the ventral occipitotemporal cortex, as well as negative community sensitivity (i.e., larger distances within than between communities) in 12 non-identity-selective parcels of the frontal cortex, the insula, the putamen, and the superior temporal or parietal cortex. A further 4 parcels exhibited other combinations of community-sensitivity and identity-selectivity.

We conclude that the ventral occipitotemporal cortex harbors largely coextensive representations of both the identity of objects and of statistical regularities in the order of their appearance.

The experimental paradigm and procedure are described in detail elsewhere (Kakaei & Braun, 2024). Here, we only summarize the most pertinent aspects.

2.1 Observers and behavior

Eight healthy observers (4 female and 4 male; aged 25 to 32 years) took part in behavioral training (‘sham experiment’, two sessions per observer), the functional imaging experiment (‘main experiment’, six scanning sessions per observer), and a final behavioral assessment (two sessions). All participants were paid and gave informed consent. Ethical approval was granted under Chiffre 30/21 by the ethics committee of the Faculty of Medicine of the Otto-von-Guericke University, Magdeburg.

In both sham and main experiments, observers viewed sequences of 200 recurring and non-recurring objects (see below and Fig. 1A) and attempted to classify each object as ‘familiar’ or ‘novel’ (by pressing the appropriate button). Over the course of multiple sessions, observers gradually became familiar with recurring objects and thus became able to distinguish them from non-recurring objects. Objects of the sham experiment were two-dimensional shapes, whereas objects of the main experiment were rotating, three-dimensional shapes (see below and Fig. 1A).

Fig. 1.

Experimental paradigm. (A) Observers viewed complex, three-dimensional objects that rotated slowly, presented as sequences with 200 trials (2.5 s presentation time and 0.5 s transition time). During the transition, the previous object vanished to the right while the next object approached from the left (please see https://learnmem.cshlp.org/content/suppl/2021/04/09/28.5.148.DC1/Supplemental_Movie_S1.mp4). Most objects (180 of 200) recurred multiple times within and between sequences (‘recurring’ objects). The others (20 of 200) were presented only once (‘non-recurring’ objects). Over 3 days/sessions, observers viewed 18 sequences and attempted to classify each object as either ‘familiar’ or ‘unfamiliar’. (B) Sequences were generated from quasi-random walks on either a sparse and modular graph, or a fully-connected and non-modular graph (nodes represent recurring objects, and links represent possible successions). Sequences from the left graph exhibit clustered sequential dependencies (‘structured’). Sequences from the right graph lack such dependencies (‘unstructured’). (C) Over three sessions, observers learned to classify recurring and non-recurring objects as ‘familiar’ and ‘unfamiliar’, respectively. Performance was slightly better with structured than with unstructured sequences. For the present group of observers, the difference was significant during the first and the last sessions (*p < 0.05; FDR corrected). With structured sequences, performance continued to improve slightly from the second to the third session.

Fig. 1.

Experimental paradigm. (A) Observers viewed complex, three-dimensional objects that rotated slowly, presented as sequences with 200 trials (2.5 s presentation time and 0.5 s transition time). During the transition, the previous object vanished to the right while the next object approached from the left (please see https://learnmem.cshlp.org/content/suppl/2021/04/09/28.5.148.DC1/Supplemental_Movie_S1.mp4). Most objects (180 of 200) recurred multiple times within and between sequences (‘recurring’ objects). The others (20 of 200) were presented only once (‘non-recurring’ objects). Over 3 days/sessions, observers viewed 18 sequences and attempted to classify each object as either ‘familiar’ or ‘unfamiliar’. (B) Sequences were generated from quasi-random walks on either a sparse and modular graph, or a fully-connected and non-modular graph (nodes represent recurring objects, and links represent possible successions). Sequences from the left graph exhibit clustered sequential dependencies (‘structured’). Sequences from the right graph lack such dependencies (‘unstructured’). (C) Over three sessions, observers learned to classify recurring and non-recurring objects as ‘familiar’ and ‘unfamiliar’, respectively. Performance was slightly better with structured than with unstructured sequences. For the present group of observers, the difference was significant during the first and the last sessions (*p < 0.05; FDR corrected). With structured sequences, performance continued to improve slightly from the second to the third session.

Close modal

The main experiment extended over three successive weeks, with three sessions on separate days of both the first and third week (no sessions took place in the second week). The experiments of the first and third weeks differed in four aspects: sequence type (structured or unstructured), the set of recurring objects, object color (red or blue), and responding hand (left or right). All aspects were counterbalanced across observers. With either responding hand, the index finger responded ‘familiar’ and the middle finger responded ‘unfamiliar’. Observers were not informed about the difference in sequence structure.

After the three scanning sessions of a week, observers participated in an additional behavioral session to confirm that they had, in fact, become familiar with every recurring object. Specifically, they performed a spatial search task in which they pointed out recurring target objects among non-recurring distractor objects (Kakaei et al., 2021). In addition, observers were offered the opportunity to voice anything they might have noticed about the experiment.

2.2 Experimental paradigm

Complex three-dimensional objects were computer-generated and presented as described previously (Kakaei et al., 2021). A movie can be viewed under this LINK: https://learnmem.cshlp.org/content/suppl/2021/04/09/28.5.148.DC1/Supplemental_Movie_S1.mp4. All objects were highly characteristic and dissimilar from each other (as confirmed by computational means). Objects were presented every 3s, with 2.5s viewing and 0.5s transition time (Fig. 1A). Objects were shown from all sides and, after appearing at an arbitrary angle, revolved smoothly for one full turn (period 2.5s, frequency 0.4Hz, angular frequency 144/s) about one of several axes in the frontal plane (45, 0, 45, clockwise or counter-clockwise). Axes and directions were counterbalanced for each object, and initial viewing angles were chosen randomly (Fig. 1B). All stimuli were generated with MATLAB (The MathWorks, Inc.), presented with the psychophysics toolbox (Brainard, 1997), and viewed in a mirror mounted to the MR head coil (screen resolution 960×720 pixels, frame rate 60Hz, subtending approximately 8×6 of visual angle, average luminance 50Cd/m2, background luminance 5Cd/m2). Observers responded with the right or left index finger on an MR-safe response box.

Fifteen objects recurred many times during three sessions (‘recurring’ objects), whereas other objects appeared exactly once (‘non-recurring’ or ‘singular’ objects). As mentioned, observers classified every object as either ‘familiar’ or ‘unfamiliar’ by pressing either the left or right button (counterbalanced) during its presentation. Over the course of three sessions, all observers gradually became familiar with the ‘recurring objects’ (see below). The average time-course of learning, as established by a simplified signal detection analysis, is shown in Figure 1C.

Every session comprised six sequences (‘runs’), each lasting 600s and presenting 180 ‘recurring’ and 20 ‘non-recurring’ objects (200 objects in total). As there were 15 different recurrent objects, each such object was seen 12±1.9 times during every sequence. Over the three sessions (or 18 sequences), each recurring object appeared at least 190 times each (mean ± S.D: 216±9), whereas non-recurring objects appeared only once. Altogether, there were 3,240 presentations of recurring objects (3×6×180) and 360 presentations of non-recurring objects (3×6×180).

2.3 Presentation order

To create conditions with and without predictive temporal context (‘structured’ and ‘unstructured’), sequences were generated as quasi-random walks on graphs representing the 15 recurring objects as nodes and possible continuations as edges (Fig. 1B). Each sequence started at a random node and continued with equal probability on any one of the available edges, except that immediate repetition (XX) and direct returns (XYX) were not allowed. Although generated randomly, sequences were post-selected to counterbalance the number of appearances of both objects and object pairs (Kakaei et al., 2021). Non-recurring objects were interspersed at random sequence locations.

Structured sequences were generated from the modular graph depicted left in Figure 1B. Note that each object is linked to exactly four other objects (i.e., may be preceded or followed by four other objects). Additionally, links are clustered such as to form three “communities” with five objects each. As a result, the objects from a community tended to follow each other: on average, 9±2 successive objects derived from the same community, so that these “community episodes” lasted 27±6 s on average. Moreover, the same objects tended to repeat at short intervals and the expected repetition latency of 5.5±15 (median and S.D.) was comparatively short.

In structured sequences, the 105 possible pairings of 15 objects could be divided into four groups, as illustrated further below in Figure 6A. There were 27 pairs from the same community and adjacent on the graph (SameAdjacent pairs), 3 pairs from different communities and adjacent on the graph (DA pairs), as well as 3 pairs from the same community and non-adjacent on the graph (SN pairs). Finally, 72 pairs were from different communities and non-adjacent on the graph (DN pairs).

Note that only SA pairs and DA pairs actually occurred in structured sequences, in the sense that one member occasionally followed the other. Counterbalancing ensured that all objects and all possible object pairs occurred comparably often in presentation sequences (probability approximately 1/60). See Kakaei et al., 2021 for further details about the statistics of presentation sequences.

Unstructured sequences were generated from the graph depicted right in Figure 1B. In this graph, each object was linked to all other objects (i.e., it may be preceded or followed by any one of the other objects), so that no sequential dependencies arose. As a result, same objects rarely repeated at short intervals and the expected repetition latency of 10.5±11 (median and S.D.) was comparatively long.

2.4 MRI acquisition

All magnetic-resonance images were acquired on a 3T Siemens Prisma scanner with a 64-channel head coil. Structural images were T1-weighted sequences (MPRAGE TR = 2,500 ms, TE = 2.82 ms, TI = 1,100 ms, 7 flip angle, isotropic resolution 1×1×1mm, and matrix size of 256×256×192). Functional images were T2*-weighted sequences (TR = 1,000 ms, TE = 30 ms, 65 flip angle, resolution of 3×3×3.6mm, and matrix size of 72×72×36). Field maps were obtained by gradient dual-echo sequences (TR = 720 ms, TE1 = 4.92 ms, TE2 = 7.38 ms, resolution of 1.594×1.594×2mm, and matrix size of 138×138×72).

2.5 fMRI pre-processing

Our approach to fMRI analysis was influenced by recent advances in comparing uni- and multivariate responses of corresponding voxels between different observers (Kumar et al., 2022; Nastase et al., 2019). The local correlation structure of voxel response is surprisingly similar in different observers and provides a solid basis for functional parcellation (Dornas & Braun, 2018). Such a parcellation obviates ‘searchlight’ strategies and can define high-dimensional multivariate activity in corresponding ‘parcels’ for different observers.

The fMRI pre-processing procedure was similar to that published previously (Dornas & Braun, 2018). Brain tissues were extracted and segmented using BET (Smith, 2002) and FAST (Zhang et al., 2001). Fieldmap correction, head motion correction, spatial smoothing, high-pass temporal filtering, and registration to structural and standard images were performed with the MELODIC package of FSL (Beckmann & Smith, 2004). Field map correction and registration to structural image were carried out using Boundary-Based Registration (BBR; Greve & Fischl, 2009). MELODIC uses MCFLIRT (Jenkinson et al., 2002) to correct for head motion. Spatial smoothing was performed with SUSAN (Smith & Brady, 1997), with full width at half maximum set at FWHM =5mm. To remove low-frequency artifacts, we applied a high-pass filter of the cut-off frequency f=0.01 Hz, that is, oscillations/events with periods of more than 100 s were removed. To register the structural image to Montreal MNI152 standard space with isotropic 2mm voxel size, we used FLIRT (FMRIB’s Linear Image Registration Tool; Jenkinson & Smith, 2001; Jenkinson et al., 2002) with 12 degrees of freedom (DOF) and FNIRT (FMRIB’s Nonlinear Image Registration Tool) to apply the non-linear registration. To further reduce artifacts arising from head motion, we applied despiking with a threshold of λ=100 using BrainWavelet toolbox (Patel et al., 2014). Later, we regressed out the mean CSF activity as well as 12 DOF translation and rotation factors predicted by a motion correction algorithm (MCFLIRT). Afterward, the time series of each voxel was whitened and detrended. This resulted in a temporal signal-to-noise ratio (average over time-series, divided by standard deviation over time-series) of approximately 200, with a standard deviation of ±30 over voxels and of ±90 over observers.

Finally, the 160,099 voxels of MNI152 space were grouped into 758 functional parcels according to the MD758 atlas (Dornas & Braun, 2018). Each functional parcel is associated with an anatomically labeled region of the AAL atlas (Tzourio-Mazoyer et al., 2002) and comprises approximately 200 voxels or approximately 1.7cm3 of gray matter volume (212±70 voxels, range 45 to 462 voxels). Parcels were defined for a small population of observers such as to maximize signal covariance within, and minimize covariance between parcels in the resting state. In contrast to other parcellation schemes, this was based exclusively on the (typically strong) functional correlations within each anatomical region and disregarded the (typically weak) correlations between different anatomical regions. The MD758 parcellation offers superior cluster quality, correlational structure, sparseness, as well as consistency with fiber tracking, compared to other parcellations of similar resolution (Albers et al., 2021; Dornas & Braun, 2018).

2.6 fMRI data analysis

To study the effect of sequence structure on the neural representation of object shape, we extracted the multivoxel activity pattern at Nt=9 time points following object onset. In a functional parcel with Nvox voxels, this response pattern constituted a point (or vector) in an Ndim-dimensional space, where Ndim=NtNvox (Fig. 2A). Our objective was to compare distances between response patterns to the same objects, to different objects in the same community, and to different objects in different communities, in other words, to analyze representational similarity or dissimilarity in terms of the standardized Euclidean (Mahalanobis) distance between responses in a high-dimensional space (RSA; Kriegeskorte & Diedrichsen, 2019). Over all 758 parcels, response dimensionality was Ndim=1,911±634 (mean and standard deviation), with a range of 405 to 4,158.

Fig. 2.

Direct linear discriminant analysis (DLDA) of multivariate BOLD signals. For each observer and functional parcel, we identified a 14-dimensional space that optimally discriminated the 15 classes of activity patterns associated with recurring objects. Typically, this space was contained largely within the space of the 14 principal components (88±5% of variance), but excluded shared variance associated with all object presentations. (A) For a given parcel with Nvox voxels (yellow: Inf-Front-Oper-R, parcel 146), activity was recorded over 9s during and after object presentation (from 2 to 11s after onset). Each such activity pattern corresponded to a point in a 9Nvox-dimensional vector space (right), here represented schematically by spheres (red, green, and blue). Images exemplify average responses to three objects with a color scale. (B) In the optimally discriminative subspace, S14, Euclidean distance measures the representational similarity of different responses to the same object.

Fig. 2.

Direct linear discriminant analysis (DLDA) of multivariate BOLD signals. For each observer and functional parcel, we identified a 14-dimensional space that optimally discriminated the 15 classes of activity patterns associated with recurring objects. Typically, this space was contained largely within the space of the 14 principal components (88±5% of variance), but excluded shared variance associated with all object presentations. (A) For a given parcel with Nvox voxels (yellow: Inf-Front-Oper-R, parcel 146), activity was recorded over 9s during and after object presentation (from 2 to 11s after onset). Each such activity pattern corresponded to a point in a 9Nvox-dimensional vector space (right), here represented schematically by spheres (red, green, and blue). Images exemplify average responses to three objects with a color scale. (B) In the optimally discriminative subspace, S14, Euclidean distance measures the representational similarity of different responses to the same object.

Close modal

To analyze the response variance that discriminates the 15 recurring objects, we reduced dimensionality with Fisher’s Linear Discriminant Analysis (LDA) for multiple classes to identify the (at most) (κ1)-dimensional subspace S that optimally discriminates κ=15 classes of activity patterns (i.e., responses to the 15 recurring patterns). Here, optimality is defined as simultaneously minimizing within-class variance and maximizing between-class variance of activity patterns. This approach corresponded to a ‘supervised’ principal component analysis and yielded (κ1) informative dimensions.

To interpret the results, it is important to appreciate the commonality with principal component analysis (PCA). Over all 758 parcels, the first 14 principal components captured 64±5% to the the total response variance following an object presentation. However, about one-third of this variance was shared between presentations and thus uninformative about the identity of the presented object. The 14-dimensional subspaces S identified by LDA captured the remaining two-thirds (66±5%) of the PCA variance, which were informative about the objects present. In fact, almost all of the subspace variance (88±5%) overlapped with the space of the 14 leading principal components. Moreover, the subspaces S tended to distribute variance more uniformly over dimensions (3±3% per dimension) than principal components did (4±6% per dimension).

This commonality between LDA and PCA explained why subspaces S captured response variance under all conditions (non-recurring objects, non-selective parcels), not just the conditions for which they had been optimized. A numerically tractable procedure for identifying the optimal subspace S is available in terms of ‘direct LDA’ or DLDA (Ye et al., 2006; Yu & Yang, 2001). The link github.com/cognitive-biology/DLDA provides a Matlab implementation of DLDA.

The generic nature of subspaces S permitted us to investigate also the representation of “temporal context” in this way. Specifically, we analyzed the representation of temporal communities with data from structured sequences (8 observers) but performed identical analyses on data from unstructured sequences (8 observers) for comparison. As detailed further below, spurious ‘effects’ of community structure can be observed due to systematic and/or unsystematic fluctuations of responsiveness over time. To guard against such spurious effects, we removed the effects of temporal proximity and verified that our analyses yielded null results with data from unstructured sequences.

2.6.1 Amplitudes, distances, and temporal correlations

Note that the straightforward approach of decoding community identity (i.e., “community selectivity”) would have been confounded by object identity, as any selectivity for “object identity” would necessarily have entailed also some degree of selectivity for “community identity”. To sidestep this issue, we devised a somewhat weaker yet independent measure—“community sensitivity”—which compared pairwise distances between responses to objects within and between communities, as detailed further below.

Activity patterns xjk associated with trials k were analyzed in the maximally discriminative subspace S. The average normalized amplitude ak=1κ1j=1κ1xjk2 was a=0.99, and the average normalized distance dkl=1κ1j=1κ1(xjkxjl)2 between patterns from trials k and l was d=1.40. This value corresponds to the normalized distance expected between random patterns, as the average Euclidean distance between two random points, on an n-dimensional hypersphere of unit radius, is

(1)

with dave1.4017 for n=14.

On successive trials, activity patterns exhibited a weak temporal correlation, with approximately 5% smaller distances at delays below 4 trials and approximately 2% larger distances at delays ranging from 6 to 15 trials. Supplementary Figure S2 shows the delay-dependent distance between response pairs, as well as the pairwise distance within runs, averaged over all parcels and observers. The delay-dependence of response distances to the same objects was comparable in identity-selective and non-selective parcels, although the delay-dependence of distances to different objects was slightly more pronounced in non-selective parcels. In contrast to multivariate response distances, we did not observe any effect of delay on multivariate response amplitudes (i.e., we observed neither repetition suppression nor repetition facilitation).

To correct for this temporal correlation, we established for each parcel w the average delay-dependent distance Tw(Δi)=dw,u,r(Δi)u,r between patterns with relative delay Δi, where the average was taken over subjects u and runs r. The time-course Tw allowed us to subtract the average effect of temporal correlation by computing residual distance dw,u,rcorrected(Δi)=dw,u,r(Δi)Tw(Δi)+Tw(Δi)Δi, where Tw(Δi)Δi is the average value over delays Δi.

2.6.2 Measure of identity-selectivity

Selectivity for object identity was quantified in terms of “classification accuracy”, αidentity, which was defined as the probability that a multivariate response was classified correctly on the basis of distance to class centroids. To test for statistical significance, we relied on the “minimum accuracy” over all observers or data sets (Allefeld et al., 2016). Further details are provided in the companion paper (Kakaei & Braun, 2024).

2.6.3 Geometry of temporal community representations

To assess the representation of community structure, we compared pairwise distances between responses to objects within and between communities for each parcel w. Specifically, we first obtained pairwise distances dij and sorted them into two groups: within-community distances with average DwW=dij(ij|i,jL) and between-community distances with average DwB=dij(ij|iL,jK,LK). Then, we established the signed difference ΔwBW=DwBDwW, which we termed “community sensitivity”, and assessed the statistical significance of ΔwBW with a two-sample t-test. After correcting for false discovery (Benjamini & Hochberg, 1995), we summarized the results for each parcel in terms of t-statistics tBW.

A similar procedure was used to assess differences between classes of object pairs. Specifically, for every parcel, we established the average pairwise distance Dw (averaged over all pairs and all observers) for different classes of object pairs: same community & adjacent (SA), same community & non-adjacent (SN), different communities & adjacent (DA), and different communities & non-adjacent (DN). The resulting values were termed DwSA, DwDA, DwSN, and DwDN. The statistical significance was assessed by comparing the observed values to the pairwise distance Dwdiff, which contains pairwise distances of all 4 types of object pairs, by a two-sampled t-test. The results were summarized in terms of t-statistics twSA, twDA, twSN, and twDN. The behavioral evidence (Kakaei et al., 2021) informed our a-prior hypothesis that SA pairs might be more similar, and NA pairs more dissimilar, than the overall average. A further a-priori hypothesis was that DA pairs might be more dissimilar, as they involve the “linking objects” that mark transitions between different communities.

Note that response distances within and between communities are confounded by temporal proximity because responses within communities tend to have shorter relative latencies than responses between communities (A. C. Schapiro et al., 2013). To assess the degree to which temporal proximity contaminates the observed community signal, we repeated the analysis of community representations for different ranges of temporal latencies. Specifically, we recalculated the average pairwise distances Dwbetween and Dwwithin, and the corresponding twBW for object pairs i,j whose relative latencies τij where bounded from below by τLBτij and from above by sequence termination, with the lower bound ranging over τLB{1,,30}. The t-statistics of response pairs with bounded latencies and their corresponding p-values, corrected for false discovery rate, will be denoted as tBW(τLB) and PBW(τLB), respectively.

To assess whether community representations are consistent over different latency ranges, we examined how tBW(τLB) changes with its lower bound τLB. Specifically, for each parcel, we defined a consistency measure τsig as the highest lower bound at which tBW(τsig) remains significant. We considered a parcel as ‘community sensitive’ only if τsig30. In other words, a ‘community sensitive’ parcel exhibited significant between-community separability tBW for all lower bounds τLB{1,,30}. This ruled out the possibility that community sensitivity was a spurious effect of temporal proximity (which was strongest at shorter latencies).

2.6.4 Statistical power

The representational similarity analysis concerning object identity described in Kakaei and Braun (2024) was based on approximately 216 object responses (18 sequences with approximately 12 recurrences of each object) from each of 16 data sets, affording approximately 370,000 representational distances for each of the 105 object pairs. In contrast, the assessment of representational similarity concerning community was based on approximately 120 community episodes (18 sequences with approximately 6 recurrences of each community) from each of 8 data sets, affording approximately 57,000 representational distances for each of the 3 community pairs. Hence, the number of independent pairwise observations about identity was approximately 225 times larger than the number about community. Accordingly, on purely statistical grounds, the sensitivity of our paradigm for detecting community sensitivity is expected to be approximately 15 times lower than for detecting identity selectivity.

As a consequence of this statistical disparity, we were unable to establish the temporal development of “community sensitivity” over the 3 days/sessions (see Supplementary Fig. S8). For “object identity”, we could demonstrate temporal developments not only over days/sessions but even over individual runs (Kakaei & Braun, 2024).

2.6.5 Dimensional reduction

To visualize the representational geometry of community structure in two dimensions, we calculated a distance matrix Dw,u,r(i,j)=dij of response distances corrected for temporal proximity within each run r, for every parcel w and observer u. Averaging over the runs produced matrices Dw,u of size 15×15 of the average distances between the 15 recurring objects in the discriminative subspace S.

As we did not expect different observers to exhibit comparable activity patterns and distance matrices, we did not wish to average these matrices directly. To sidestep the difficulty, we permuted the object order of the matrix 104 times while maintaining graph structure (adjacency and module membership), to first obtain an ensemble average matrix D¯w,u for each observer, and finally the observer average D¯¯w of ensemble averages.

Using multidimensional scaling (Matlab function mdscale), we converted the observer average matrix D¯¯w to a two-dimensional map of 15 locations approximating these pairwise distances. These maps reveal the average response distance between objects within and between communities, as well as the average distance between ‘linking’ and other objects. Note that the three-fold rotational symmetry of these maps is owed to the permutation procedure.

3.1 Behavior

Observers readily became familiar with recurring objects, as confirmed by the time course of performance in classifying objects as ‘familiar’ (recurring) or ‘novel’ (non-recurring), which exceeded 75% correct after one session and approached 90% performance after two further sessions (Fig. 1C). Typically, the classification of a particular object changes from ‘novel’ to ‘familiar’ at a particular point in time (“onset of familiarity”). After the experiment, several observers mentioned having invented linguistic labels for each recurring object (‘anchor’, ‘butterfly’, ‘hedgehog’, etc.). Some observers mentioned noticing that objects repeated in close temporal proximity in the ‘structured condition’. However, no observer mentioned noticing that the recurring objects formed three distinct “communities”.

In the structured condition, familiarity increased slightly faster and “onsets of familiarity” occurred somewhat sooner. Specifically, performance was slightly but significantly higher during much of the first and third sessions, and comparable in the second session (Fig. 1C). Moreover, after an object became familiar, the next object to do so was significantly more likely than chance to be a ‘same adjacent’ (SA) object and significantly less likely to be a ‘different non-adjacent’ (DN) object. Specifically, the frequency of successive onsets of familiarity was elevated by 0.15 (p<0.05) for SA pairs, and reduced by 0.15 (p<0.05) for DN pairs, but did not differ significantly for either DA pairs (“linking objects”) or SN pairs.

Average reaction times mirrored the performance results in that they were higher before than after the “onset of familiarity” (p<0.01). In the structured condition, reaction times for linking objects (members of DA pairs) and internal objects (all others) did not differ significantly during either the first, second, or third session (p<0.01). Thus, the behavioral effects of sequence structure did not extend to reaction times. This was consistent with the behavioral results reported previously (Kakaei et al., 2021).

3.2 Representation of temporal community structure

To assess the effects of temporal community structure, we analyzed the multivariate BOLD activity of each brain parcel over 9s (or 9TR), starting with the onset of object presentation. Specifically, we analyzed linear distances between multivariate responses after reducing the dimensionality of the originally O(1,000)-dimensional responses to the 14 dimensions of an ‘optimal subspace’ S. We chose this subspace such as to maximize the discriminability of responses to different recurring objects, using Fisher’s linear discriminant analysis (LDA). Unlike principal component spaces, the optimal subspaces disregarded variance that was shared between responses to different objects and emphasized variance that distinguished responses to different objects. The dimensionality of the subspace (14) reflected the number of recurring objects (15) and was large enough to capture the major part of the response variance.

Over all 758 parcels, the first 14 principal components captured 64±5% of the total response variance following an object presentation. However, about one-third of this variance was shared between presentations of different objects and was thus uninformative about the objects. The 14-dimensional subspaces S identified by LDA captured the remaining two-thirds (66±5%) of the PCA variance, which were informative about the objects present. Moreover, the subspaces S tended to distribute variance more uniformly over dimensions (3±3% per dimension) than principal components did (4±6% per dimension).

In principle, response distances could have reflected temporal community structure in different ways, as illustrated schematically in Figure 3. For example, responses to objects in the same community could be systematically closer together than to objects in different communities, indicating greater representational similarity (‘positive sensitivity’; Fig. 3A). Alternatively, responses to objects in the same community could be systematically further apart than to objects in different communities, indicating less representational similarity (‘negative sensitivity’; Fig. 3B). A third possibility would be no systematic relationship between response distance and community membership (Fig. 3C). Optimal subspaces were chosen such as to maximize distances between objects regardless of temporal community and thus favored neither possibility over another. In fact, optimal subspaces were computed in the same way whether or not temporal communities were present (structured and unstructured conditions).

Fig. 3.

Possible representations of temporal community structure (highly schematic). Disks represent multivariate responses to 6 objects, and two-dimensional distances represent multivariate distance. Colors represent temporal communities. In each case, the average distance within and between communities is provided (DW and DB, respectively). (A) Responses to objects are closer within than between communities (dotted boxes). (B) Responses to objects are further apart within than between communities. (C) Responses to objects are, on average, comparably distant within and between communities.

Fig. 3.

Possible representations of temporal community structure (highly schematic). Disks represent multivariate responses to 6 objects, and two-dimensional distances represent multivariate distance. Colors represent temporal communities. In each case, the average distance within and between communities is provided (DW and DB, respectively). (A) Responses to objects are closer within than between communities (dotted boxes). (B) Responses to objects are further apart within than between communities. (C) Responses to objects are, on average, comparably distant within and between communities.

Close modal

A difficulty in assessing community sensitivity is that it is confounded by the known temporal auto-correlation of multivariate activity (A. C. Schapiro et al., 2013). As illustrated in Figure 4A, pairwise distances were computed for all observers, runs, and parcels w, to obtain average pairwise distance Dwwithin communities, average pairwise distance Dwbetween between communities, and the average separability ΔwBW=DwbetweenDwwithin. For every parcel w, we established “community sensitivity” by assessing whether or not ΔwBW values differed significantly from zero with t-statistic tBW. Two measures were taken to correct for temporal auto-correlations and to dissociate community sensitivity and temporal proximity (see Section 2.6.1). Firstly, for each raw pairwise distance and its latency, we computed a residual pairwise distance by subtracting the average distance at that latency. Secondly, we compared the t-statistic tBW for subsets of pairwise distances covering different temporal latency ranges (τLBτ30;τLB{1,,30}). For all parcels, significance decreased monotonically when lower bound τLB was raised and shorter latencies were progressively excluded. Thus, the situation was summarized by the largest value of τLB at which tBW statistic was significant, which value was termed τsig. A high value of τsig indicated significance over all latency ranges, both including shorter latencies (low values of τLB) and excluding shorter latencies (high values of τLB). A low value of τsig indicated significance only for ranges that included shorter latencies (low values of τLB).

Fig. 4.

Distribution of sensitivity to temporal community structure. (A) Pairwise distances between object responses (triangular matrices) were corrected for the average auto-correlation, thresholded by latency τij (lower bound τLB, indicated by shading), and sorted into different subsets—within-community pairs (cyan, magenta, orange) and between-community pairs (grey)—according to object positions on the modular path (right). For the average signed difference ΔBW, statistic tBW was computed. (B) Number of parcels with consistent significance up to τsig for residual (solid) and raw (dashed) pairwise distances. The average duration of a community visit was 9.4±0.15 (gray shading). (C) Representation of ‘temporal communities’ by parcels with consistently significant ΔBW. In 14 parcels (red), between-community pairs are significantly more separable (tBW>0, corrected p<0.05) over all latency ranges whereas, in 13 parcels (blue), within-community pairs are more separable (tBW<0) over all ranges. Labeling indicates parcels in visual cortex (V1, V2, V3, hV4), lateral occipital cortex (LO1, LO2), fusiform and lingual gyrus (Fus, Lin), anterior inferiotemporal cortex (AIT), intraparietal sulcus (IPS), superior temporal cortex, supramarginal gyrus, medial frontal cortex, precuneus, insula (Ins), Rolandic operculum, precentral cortex, and frontal pole (FP). (D) Between-community separability tBW for different latency ranges (lower bound tBW), for ‘community-sensitive’ parcels with positive tBW>0 (red) and negative tBW<0 (blue). The mean and S.D. of separability over all parcels are shown in gray.

Fig. 4.

Distribution of sensitivity to temporal community structure. (A) Pairwise distances between object responses (triangular matrices) were corrected for the average auto-correlation, thresholded by latency τij (lower bound τLB, indicated by shading), and sorted into different subsets—within-community pairs (cyan, magenta, orange) and between-community pairs (grey)—according to object positions on the modular path (right). For the average signed difference ΔBW, statistic tBW was computed. (B) Number of parcels with consistent significance up to τsig for residual (solid) and raw (dashed) pairwise distances. The average duration of a community visit was 9.4±0.15 (gray shading). (C) Representation of ‘temporal communities’ by parcels with consistently significant ΔBW. In 14 parcels (red), between-community pairs are significantly more separable (tBW>0, corrected p<0.05) over all latency ranges whereas, in 13 parcels (blue), within-community pairs are more separable (tBW<0) over all ranges. Labeling indicates parcels in visual cortex (V1, V2, V3, hV4), lateral occipital cortex (LO1, LO2), fusiform and lingual gyrus (Fus, Lin), anterior inferiotemporal cortex (AIT), intraparietal sulcus (IPS), superior temporal cortex, supramarginal gyrus, medial frontal cortex, precuneus, insula (Ins), Rolandic operculum, precentral cortex, and frontal pole (FP). (D) Between-community separability tBW for different latency ranges (lower bound tBW), for ‘community-sensitive’ parcels with positive tBW>0 (red) and negative tBW<0 (blue). The mean and S.D. of separability over all parcels are shown in gray.

Close modal

When raw pairwise distances were used, almost all parcels (613 out of 758 parcels) exhibited significant separability ΔwBW. When residual pairwise distances were considered, ninety-three parcels retained significant ΔwBW (left margin of Fig. 4B). In 28 of these 93 parcels, between-community separability was higher (tBW>0) and in the remaining parcels, it was lower (tBW<0).

This disparity between raw and residual distances shows that community structure is confounded by temporal auto-correlation to a considerable degree. This is also evident from strong dependence of tBW on the range of temporal latencies τsig (Fig. 4B). When only latency ranges including shorter latencies are considered (τsig5), many more parcels are consistently significant than when ranges excluding shorter latencies are also considered (τsig>15). Applying the strictest criterion and considering only parcels with significant ΔwBW (FDR corrected p<0.05) for all latency bounds τLB{1,,30} (τsig=30), we obtained 27 parcels that we considered ‘community sensitive’. These parcels are listed in Appendix Table A1 and illustrated in Figure 4C to D and in Supplementary Figure S1.

The above analysis yielded interpretable results for strongly-structured presentation sequences, where every object can be objectively assigned to one particular community. When the analysis was repeated for unstructured presentation sequences (by counterfactually assuming a structured sequence and assigning communities accordingly), no systematic results were obtained, as shown in Supplementary Figure S3. Specifically, apparent community sensitivity is observed only when uncorrected distances over low-latency ranges are considered. Correcting for temporal correlations eliminates this spurious sensitivity. The static matrix of average pairwise distances provides an instructive baseline for spurious ‘sensitivity’ that is entirely due to temporal correlations. Apart from very short latencies, the results from this matrix are comparable to results from unstructured sequences, for both positively and negatively community-sensitive parcels temporal correlations (Supplementary Fig. S3B, C). Results for structured sequences are dramatically different (both higher and lower), corroborating the validity of our analysis of community sensitivity.

Fourteen community-sensitive parcels with higher separability of between-community pairs (ΔBW>0) were located in bilateral occipital regions and in ventral occipitotemporal regions of the right hemisphere (visual cortex, lateral occipital cortex, fusiform and lingual gyrus, anterior inferior temporal cortex, as well as intraparietal cortex and middle frontal cortex; Fig. 4C; Appendix Table A1). Eleven of these parcels were also identity-selective. Thirteen other parcels exhibited significantly lower separability of between-community pairs (ΔBW<0) and were located in the superior temporal cortex, supramarginal gyrus, insula, operculum, medial frontal cortex, and the frontal pole (Fig. 4C; Appendix Table A1). In this latter group, 12 parcels were not identity-selective.

The respective cortical distributions of the representations of object identity and community membership are compared and illustrated in Figure 5. The criterion for community-sensitivity was a significantly positive or negative t-score value tBW, whereas the criterion for identity-selectivity was a significantly positive minimum statistic of classification accuracy αmin (for details, see Kakaei & Braun, 2024). Figure 5C shows average classification accuracy αidentity as well as αmin. Coloring indicates whether parcels combined identity-selectivity with positive community-sensitivity (11 parcels, orange) or negative community-sensitivity (1 parcel, cyan), or whether parcels were either exclusively community-sensitive (3 parcels positively in red, 12 parcels negatively in blue) or exclusively identity-selective (112 parcels, yellow). Of the 124 identity-selective parcels, 12 parcels (approximately 10%) were additionally community-sensitive. Jointly selective/sensitive parcels were most common in the mid-level visual cortex (ventral occipital cortex, lingual and fusiform gyrus) and somewhat less common in the early visual cortex (V1, V2, V3, hV4). Jointly selective/sensitive parcels were largely absent from high-level visual areas in the parietal and frontal cortex (inferior parietal sulcus, superior parietal lobule, insula, inferior and medial frontal cortex), but were present in the anterior inferior temporal cortex. The one negatively community-sensitive parcel in the intraparietal sulcus appeared to be an exception. In summary, jointly selective/sensitive parcels were present at all levels of the ventral visual pathway.

Fig. 5.

Comparison of selectivity for object identity and sensitivity to community structure. (A) Anatomical distribution of 142 parcels that are identity-selective, community-sensitive, or both. 11 parcels (orange) are both identity-selective and positively community-sensitive, while 1 parcel (cyan) combines identity-selectivity with negative community-sensitivity. 15 parcels are exclusively community-sensitive, 3 parcels positively (red) and 12 parcels negatively (blue). The remaining 112 parcels (yellow) are exclusively identity-selective. (B) Share of identity and community representation in 142 parcels with significant representation, assigned to 29 topographical regions, as defined by L. Wang et al. (2015). In the right hemisphere, two parcels (428 and 430) are missing because they could not be assigned to any topographical regions. Coloring corresponds to (A) and indicates the fraction of voxels from parcels with different selectivity. Visual cortex (V1-hV4), ventral occipital cortex (VO), lateral occipital cortex (LO), lingual and fusiform gyri (LIN/FS), medial temporal areas (MST, hMT), intraparietal sulcus (IPS), superior parietal lobule (SPL), anterior inferior temporal cortex (AIT), insula and supramarginal gyrus (INS/SMG), inferior frontal cortex (IFC), medial frontal cortex (MFC), and frontal pole (FP). (C) Quantitative comparison of selectivity for identity and sensitivity for community over all parcels. Identity-selectivity is quantified either by average classification accuracy αidentity (top) or by the minimum statistic of classification accuracy (bottom). Community-sensitivity is measured by positive or negative values of tBW. Significantly sensitive parcels are represented by colored disks, and non-sensitive parcels by grey dots. Coloring corresponds to (A).

Fig. 5.

Comparison of selectivity for object identity and sensitivity to community structure. (A) Anatomical distribution of 142 parcels that are identity-selective, community-sensitive, or both. 11 parcels (orange) are both identity-selective and positively community-sensitive, while 1 parcel (cyan) combines identity-selectivity with negative community-sensitivity. 15 parcels are exclusively community-sensitive, 3 parcels positively (red) and 12 parcels negatively (blue). The remaining 112 parcels (yellow) are exclusively identity-selective. (B) Share of identity and community representation in 142 parcels with significant representation, assigned to 29 topographical regions, as defined by L. Wang et al. (2015). In the right hemisphere, two parcels (428 and 430) are missing because they could not be assigned to any topographical regions. Coloring corresponds to (A) and indicates the fraction of voxels from parcels with different selectivity. Visual cortex (V1-hV4), ventral occipital cortex (VO), lateral occipital cortex (LO), lingual and fusiform gyri (LIN/FS), medial temporal areas (MST, hMT), intraparietal sulcus (IPS), superior parietal lobule (SPL), anterior inferior temporal cortex (AIT), insula and supramarginal gyrus (INS/SMG), inferior frontal cortex (IFC), medial frontal cortex (MFC), and frontal pole (FP). (C) Quantitative comparison of selectivity for identity and sensitivity for community over all parcels. Identity-selectivity is quantified either by average classification accuracy αidentity (top) or by the minimum statistic of classification accuracy (bottom). Community-sensitivity is measured by positive or negative values of tBW. Significantly sensitive parcels are represented by colored disks, and non-sensitive parcels by grey dots. Coloring corresponds to (A).

Close modal

Note that the comparison of community- and identity-selectivity was skewed by disparate statistical power. The assessment of community sensitivity was based on approximately 225 times fewer observed response distances than the assessment of identity-selectivity (see Section 2), so statistical sensitivity was expected to be approximately 15 times lower. Accordingly, if community-sensitivity was detected in only a fraction of identity-selective parcels, this could, in part, have been due to this disparity in statistical power.

Nominally non-identity-selective parcels with positive community-sensitive were located in the anterior inferior temporal cortex and in the medial frontal cortex. As seen in the top panel of Figure 5C, the average classification accuracy αidentity of these 3 parcels was comparable to other identity-selective parcels. However, these parcels just missed the minimum statistics criterion for significance, as seen in the bottom panel. It seems possible that community-sensitivity degraded identity-selectivity in these parcels, in the sense that reduced response distances within a community might also have reduced distances between the different objects of this community.

Non-identity-selective parcels with negative community-sensitivity were located in the insula, the medial frontal cortex, and at the frontal pole. These 11 parcels exhibited no trace of identity-selectivity in terms of either the observer average or the minimum statistics. Negative sensitivity implies that responses to objects from different communities were more similar than responses to objects from the same community. As discussed below, it seems possible that the responses in these areas placed particular emphasis on ‘linking objects’, thereby highlighting the ‘novelty’ or ‘surprise’ associated with the transition to another community and the appearance of unexpected objects.

3.3 Representation of object pairs

Structured presentation sequences consist of different types of object pairs, such as adjacent and non-adjacent pairs, or ‘linking’ pairs (between different communities) and ‘internal’ pairs (within the same community). Thus, it was natural to wonder whether different types of object pairs might have contributed differentially to our average measure, tBW, for “community sensitivity”?

To address this question, we compared the statistical significance of the signed difference in response distances between and within communities for all object pairs, tBW, and for specific types of object pairs: non-adjacent objects in different communities (DN), non-adjacent objects in the same community (SN), adjacent objects in different communities (DA), and adjacent objects in the same community (SA). The results are shown in Figure 6. The separability measure tSA was negatively correlated with tBW (ρ=0.91, p<0.01), whereas the measure tDN was positively correlated with tBW (ρ=0.93, p0.01). The separability measures tDA and tSN were also negative correlated with tBW, though much less strongly (ρ=0.15, p<0.01 and ρ=0.19, p<0.01; respectively). These results were robust and held for all lower temporal bounds τLB30, except for the correlation between tBW and tDA, which held only for τLB28.

Fig. 6.

Neural representation of different types of object pairs. Pairs of recurring objects may be in the same (S) or different (D) communities, and may occupy adjacent (A) or non-adjacent (N) positions on the path. Differential separability of between- and within-community pairs (measured by t-score value tBW) is compared to separability tSA of adjacent objects in the same community (left), tDN of non-adjacent objects in different communities (middle left), tDA of adjacent objects in different communities (middle right), and tSN of non-adjacent objects in the same community (right), for all 758 parcels. Community-sensitive parcels are shown in red or blue (as in Fig. 4C).

Fig. 6.

Neural representation of different types of object pairs. Pairs of recurring objects may be in the same (S) or different (D) communities, and may occupy adjacent (A) or non-adjacent (N) positions on the path. Differential separability of between- and within-community pairs (measured by t-score value tBW) is compared to separability tSA of adjacent objects in the same community (left), tDN of non-adjacent objects in different communities (middle left), tDA of adjacent objects in different communities (middle right), and tSN of non-adjacent objects in the same community (right), for all 758 parcels. Community-sensitive parcels are shown in red or blue (as in Fig. 4C).

Close modal

These results show that the representation of community structure (indexed by tBW) includes a reduced separation of SA pairs (indexed by tSA), as well as an increased separation of DN pairs (indexed by tDN). Recall that SA (and DA) pairs occur in presentation sequences (with probability 1/60), whereas SN (and DN) pairs never occur. The selective modulation of representational distance for one of the two adjacent (and therefore occurring) pairs appears to be a correlate of temporal community structure. The same can be said for the selective modulation of representational distance for one of the two non-adjacent (and therefore non-occurring) pairs. Furthermore, the correlation between community representation and separation of SA and DN pairs is evident not only in the few parcels meeting the statistical threshold for community sensitivity (red and blue dots in Fig. 6), but also in all other parcels as well (grey dots in Fig. 6). Thus, reduced separation of SA pairs and increased separation of DN pairs appear to be a general feature of the cortical representation of community structure.

The results described above depend critically on the correction for temporal correlations (Supplementary Fig. S4). Without this correction, the tBW measure for between-community separation is dominated by the influence of short-latency pairs. When shorter latencies are excluded and τLB5, the correction ceases to make a difference. This underlines again that correcting for average temporal correlations is key to establishing representations of community structure.

3.4 Representational space

A previous study with structured sequences (A. C. Schapiro et al., 2013) reported that within-community distances are typically smaller than between-community distances and illustrated this finding with multidimensional scaling. We sought to replicate this by visualizing the relative proximity of different types of object pairs. To obtain interpretable results, we employed a permutation procedure that allowed us to average proximity matrices over observers (see Section 2 for details). The resulting arrangements exhibited a three-fold rotational symmetry that was owed to this permutation procedure and therefore was artificial.

For the 14 positively community-sensitive parcels, the relative proximity of different object pairs is illustrated in Figure 7. In all cases except one, objects were clustered by community (i.e., spaced more closely within than between communities), with Temporal-Inf-R-10 providing the most extreme example. Additionally, ‘linking’ objects tended to be positioned differently than internal objects, in all but two cases closer to each other (and to the center) (Calcarine-L-9, Calcarine-R-5, Lingual-L-1, Occiptal-Mid-L-4, Occiptial-Mid-L-9, Occipital-Inf-L-2, Fusiform-L-2, Fusiform-L-6, Postcentral-R-11, Temporal-Inf-R-10). Exceptions were Frontal-Mid-R-7, where only internal objects clustered by community, and Occipital-Inf-R2/4, where linking objects were more distant from each other. As these illustrations show only relative distances, Supplementary Figure S6A provides absolute response distances in terms of the average and standard error over parcels, separately for internal objects and linking objects, as well as within and between communities. Response distances of internal objects within the same community correspond to the grand average over all object pairs, whereas distances between different communities were significantly larger. Additionally, distances between linking and internal (or linking) objects within the same community were significantly smaller. Thus, both clustering by community and relative proximity of linking objects was statistically significant. On average, this corresponded to the possibility shown schematically in Figure 3A.

Fig. 7.

Representation of temporal community structure in positively sensitive parcels. Multidimensional reduction of the pairwise distance matrix averaged over path permutations and over observers. Communities are distinguished by color and linking objects by a black outline, as indicated by the path diagram (inset). Fourteen parcels exhibited higher separability between communities than within communities (tBW>0). Identity-selective parcels are marked with .

Fig. 7.

Representation of temporal community structure in positively sensitive parcels. Multidimensional reduction of the pairwise distance matrix averaged over path permutations and over observers. Communities are distinguished by color and linking objects by a black outline, as indicated by the path diagram (inset). Fourteen parcels exhibited higher separability between communities than within communities (tBW>0). Identity-selective parcels are marked with .

Close modal

Results for the 13 negatively community-sensitive parcels are shown in Figure 8. The clustering of internal objects (Frontal-Sup-L-12, Frontal-Sup-Orb-R-3, Frontal-Mid-R-16, Frontal-Med-Orb-R-3, Parietal-Sup-L-8, and Temporal-Sup-R-6) was variable but, when averaged over parcels, internal objects were more distant within than between communities (Supplementary Fig. S6A). Specifically, within the same community, response distances of internal objects to other internal objects (or linking objects) were significantly larger than the grand average over all object pairs, whereas between communities response distances were smaller. On average, this corresponded to the possibility shown schematically in Figure 3B. In six parcels, all linking objects were distant from each other (and from the center), suggesting that the representation in these parcels individuated different transitions between communities (Frontal-Sup-R-19, Frontal-Sup-Orb-R-3, Insula-R-6, Parietal-Sup-L-8, Precuneus-L-12, Putamen-R-5). However, in seven other parcels, linking objects were separated less well than internal objects (Frontal-Sup-L-12, Frontal-Mid-R-16, Rolandic-Oper-L-2, Frontal-Med-Orb-R-3, Insula-R-5, SupraMarginal-R-4, Temporal-Sup-R-6), suggesting that the representation conflated different transitions between communities.

Fig. 8.

Representation of temporal community structure in negatively sensitive parcels. Multidimensional reduction of the pair-wise distance matrix averaged over path permutations and observers. Communities are distinguished by color and linking objects by a black outline, as indicated by the path diagram (inset). Thirteen parcels exhibited lower separability between communities than within communities (tBW<0). Identity-selective parcels are marked with .

Fig. 8.

Representation of temporal community structure in negatively sensitive parcels. Multidimensional reduction of the pair-wise distance matrix averaged over path permutations and observers. Communities are distinguished by color and linking objects by a black outline, as indicated by the path diagram (inset). Thirteen parcels exhibited lower separability between communities than within communities (tBW<0). Identity-selective parcels are marked with .

Close modal

It is instructive to also compare parcels that were not classified as either identity-selective or community-sensitive. Results for 15 randomly chosen ‘non-selective’ parcels are shown in Supplementary Figure S5. Perhaps not surprisingly, the results were quite heterogeneous and few differences reached statistical significance when averaged over parcels (Supplementary Fig. S6A). However, in several individual parcels, clustering by communities and/or prominent representation of linking objects was evident.

As a final control, we analyzed average pairwise distances in the responses obtained with unstructured sequences. Here, we failed to observe significant deviations from the grand average distance, either for internal and linking objects or within and between communities (Supplementary Fig. S6B). This corroborates that the results obtained with structured sequences were due to the presence of temporal structure and/or temporal communities.

We investigated incidental and automatic learning of regularities and dependencies without explicit behavioral task (Aslin, 2017; Fiser & Lengyel, 2022; Perruchet, 2019; Perruchet & Pacton, 2006; Saffran & Kirkham, 2018; A. Schapiro & Turk-Browne, 2015). Our aim was to compare the cortical basis of concurrent learning of statistical structure with two timescales, namely, explicit learning to recognize complex objects presented for ~3 s (Cox et al., 2005; Tian & Grill-Spector, 2015; Wallis & Bülthoff, 2001) and implicit learning of task-irrelevant contingencies in the sequence of object presentations (“temporal communities” lasting ~30 s) (Fiser & Aslin, 2002; Kakaei et al., 2021; Miyashita, 1988; Sáringer et al., 2022; Turk-Browne et al., 2005, 2009). Our results show that cortical representations of both object identity and temporal community structure coexist in large parts of the ventral occipitotemporal cortex.

Previous studies have localized view-invariant object representations in inferior temporal cortex (IT) and lateral occipital complex (LOC) (Grill-Spector et al., 2001; Sáry et al., 1993; Van Meel & Op de Beeck, 2020). Single-unit responses in IT of non-human primates reflect the intrinsic contingencies of an invariant representation and correlate closely with recognition performance (Jia et al., 2021; Li & DiCarlo, 2008, 2010, 2012). Human fMRI show differential adaptation in IT for congruent and incongruent shapes (Van Meel & Op de Beeck, 2018). In addition, evidence for view-invariant representations has been reported in primary-visual cortex (Eger et al., 2008), at more anterior sites such as fusiform gyrus, and ventral occipito-temporal cortex (Brants et al., 2016; Visconti di Oleggio Castello et al., 2021), as well as in several areas of the dorsal pathway (Freud et al., 2017; Jeong & Xu, 2016; Konen & Kastner, 2008; Poirier et al., 2006; Visconti di Oleggio Castello et al., 2021).

Our results confirm and extend these previous findings on cortical regions with view-invariant object representations, as described in our companion study (Kakaei & Braun, 2024). In brief, we established cross-validated multivariate representations of object identity for smallish ‘functional parcels’ (~1.7cm3 cortex volume) defined previously by a functional parcellation (MD758; Dornas & Braun, 2018). Parcels in which significant identity information was prevalent (Allefeld et al., 2016) were located in both the ventral and dorsal visual pathways, beginning with early visual areas (V1-hV4), extending to more anterior parts of ventral occipitotemporal cortex into anterior inferior temporal cortex, as well as to anterior inferior frontal cortex (Kakaei & Braun, 2024).

Our motivation to compare cortical representations of object shape and temporal object sequence derived from classical studies of object recognition in non-human primates (Erickson & Desimone, 1999; Miyashita, 1988). These studies had shown that the responsiveness of single neurons in the inferiotemporal cortex developed selectivity not only for the identity but also for the presentation order of objects, provided that animals had consistently viewed these visual objects in the same sequential order. As this order was irrelevant to the animal’s task, the development of a neural representation for sequential order constituted a prototypical instance of incidental or implicit learning.

In extensive subsequent work with “paired-associate tasks”, the sequential order of objects was made task-relevant so that learning of temporal associations became explicit. Over the course of training, the prevalence of pair-encoding neurons was found to increase in anterior parts of inferiotemporal cortex IT (Hirabayashi & Miyashita, 2014; Messinger et al., 2001; Naya et al., 2001, 2003). Additionally, neurons in IT were found to encode “object-general semantic value” in the sense of identifying whether a particular object was “familiar” or “novel” (Tamura et al., 2017). Here, we investigated the possibility that such “object-general” information could extend to membership in a “temporal community” of objects.

Previous behavioral studies have shown that humans can implicitly learn spatiotemporal associations between objects and use these regularities to enhance their cognitive performance. Observers can automatically capture spatial (Fiser & Aslin, 2001) and temporal (Fiser & Aslin, 2002; Turk-Browne et al., 2008) regularities as both joint and conditional probabilities of stimuli co-occurrence. This surpasses simple object-object associations and extends to higher-order association probabilities, over multiple objects. Even when the conditional probability between object pairs is uniform and thus uninformative of the underlying association between objects, humans are sensitive to higher-order regularities (higher-moments of conditional probability distribution) (Kahn et al., 2018; Kakaei et al., 2021; Karuza, Kahn, et al., 2017; A. C. Schapiro et al., 2013). This capability for incidental learning of complex regularities can facilitate performance in various domains, including language (e.g., Saffran et al., 1996), motor (e.g., Hunt & Aslin, 2001), spatial attention (e.g., Chun & Jiang, 1998; Jiang & Wagner, 2004), and object recognition learning (e.g., Kakaei et al., 2021).

The literature on implicit or explicit learning of temporal associations shows that both domain-specific and domain-general brain regions can be involved (for reviews, see Batterink et al., 2019; Fiser & Lengyel, 2022). Neural correlates of statistical learning are evident in early domain-specific sensory areas where spatiotemporal regularities are first extracted, to mid-level sensory areas where these representations are supposedly integrated. In the visual domain, spatiotemporal regularities emerge in lateral and ventral occipito-temporal and parieto-occipital regions in humans (Henin et al., 2021; Karuza, Emberson, et al., 2017; Rosenthal et al., 2016; Turk-Browne et al., 2009) and are observed in inferiotemporal and anterior inferiotemporal regions in non-human primates (Kaposvari et al., 2018; Meyer et al., 2014; Miyashita, 1988; Sakai & Miyashita, 1991). More abstract and generalized representations of temporal associations have been reported in more downstream, domain-general areas such as medial temporal lobe, striatum and frontal regions. Moreover, the majority of studies point to an essential role of the medial temporal lobe (MTL), particularly the hippocampus, in statistical learning (Hindy et al., 2016; Hsieh et al., 2014; A. Schapiro & Turk-Browne, 2015; A. C. Schapiro et al., 2012, 2013, 2016; Schendan et al., 2003; Turk-Browne et al., 2009, 2010). This is particularly true when sequences are repeated and when ordinal knowledge is of particular interest to observers (for reviews, see Davachi & DuBrow, 2015; Eichenbaum et al., 2016). MTL seems to be engaged in statistical learning that occurs early in the learning process but seems to disengage as learning progresses, particularly after consolidation. Concurrently, the encoding of statistical knowledge seems to transfer from MTL to the striatal-frontal network (Batterink et al., 2019; Durrant et al., 2013). Higher cortical regions in insular cortex and prefrontal cortex (PFC), including inferior frontal gyrus (IFG) and medial prefrontal cortex (mPFC), also show sensitivity to statistical regularities, particularly when the complexity increases (Giorgio et al., 2018; Henin et al., 2021; Karlaftis et al., 2019; Kourtzi & Welchman, 2019; A. C. Schapiro et al., 2013; R. Wang et al., 2017).

Here, we adapted the paradigm of A. C. Schapiro et al. (2013) and used object sequences with higher-order “temporal community structure”. In such sequences, pair probabilities are uniform in that every object is succeeded by one of four other objects with equal probability. This avoids the novelty/surprise effects that would arise if some object transitions were more common or rare than others. We term sequences with temporal communities “strongly structured”, to distinguish them from “unstructured” pseudo-random sequences where every object can be succeeded by any other object (Kakaei et al., 2021).

We studied cortical representations with a “representational similarity analysis” (RSA) approach, which relies on comparing pairwise distances between multivariate BOLD responses to different objects. A difficulty with this approach is that multivariate BOLD patterns are known to be significantly autocorrelated over 10 s of seconds (Alink et al., 2015; Henriksson et al., 2015), in part due to hemodynamic effects (Friston et al., 1994; Zarahn et al., 1997). Accordingly, it was essential to distinguish between response similarity due to genuine “temporal community” effects and response similarity due to mere temporal proximity (i.e., systematically shorter latencies between objects in the same community) (Cai et al., 2019; Gilron et al., 2016).

We took two measures to control for this confound and to distinguish between community and latency effects. First, we computed and analyzed ‘residual distances’ by subtracting from each observed distance at a certain latency the average distance at that latency (see Section 2.6.1). Second, we assessed consistency by analyzing and comparing distances in different latency ranges, for example including or excluding short latencies. These measures turned out to be essential, as nearly the entire brain would have spuriously appeared to be ‘community-sensitivity’ without them. They also proved effective, as they revealed ‘community-sensitivity’ only in multivariate BOLD responses to “strongly-structured” sequences and not in responses to “unstructured” sequences. Accordingly, we are confident that these measures identify genuine cortical representations of “temporal community”.

Our analysis of multivariate BOLD responses in 758 ‘functional parcels’ revealed two functionally and anatomically distinct kinds of ‘community-sensitivity’ (see also Fig. 3). The first kind—termed positively-sensitive—showed greater similarity of responses within communities than between communities and was observed mostly in domain-specific, visual brain regions. The second kind—termed negatively-sensitive—exhibited lesser similarity of responses within communities and was observed mostly in domain-general areas. We now discuss these two groups in more detail.

Positively community-sensitive parcels—where response distances were smaller for objects within than between communities—were located almost exclusively in ventral occipitotemporal cortex, with seven parcels in the left hemisphere (Calcarine-337, Occipital-Inf-424, Occipital-Mid-400 and -405, Fusiform-432 and -436, Lingual-363) and five parcels in the right hemisphere (Calcarine-344, Occipital-Inf-428 and -430, Temporal-Inf-751 and -755). Thus, community-sensitive parcels spanned the range of ventral occipitotemporal cortex that also contained parcels selective for object identity. Almost all positively community-sensitive parcels also exhibited significant identity-selectivity. Although the pattern of relative response distances was somewhat heterogeneous (Fig. 7), some significant trends emerged: response distances between ‘internal’ objects were above average between communities, whereas distances between ‘linking’ objects were below average both within and between communities (Supplementary Fig. S6).

While positively community-sensitive parcels comprised only a small fraction of identity-selective parcels (11 of 124 parcels), this disparity may exaggerate the true situation. As our paradigm was considerably less sensitive for community than for identity, a number of ‘false negatives’ was only to be expected. If the respective statistical sensitivities had been comparable, the overlap between the two groups might well have been larger.

These results are consistent with earlier findings that early and mid-level visual areas are sensitive to temporal regularities and can flexibly alter their activity pattern to represent the temporal context (Henin et al., 2021; Karuza, Emberson, et al., 2017; Rosenthal et al., 2016; Turk-Browne et al., 2009). These are also consistent with the classical observation that representations of temporal association develop conjointly with representations of object identity (Erickson & Desimone, 1999; Miyashita, 1988).

In contrast to numerous earlier studies (Hindy et al., 2016; Hsieh et al., 2014; A. Schapiro & Turk-Browne, 2015; A. C. Schapiro et al., 2012, 2013, 2016; Schendan et al., 2003; Turk-Browne et al., 2009, 2010), we failed to observe positive community-sensitivity in the medial temporal lobe (MTL). We do not consider this a contradiction, as our analysis did not focus on MTL and our parcellation included only six parcels in this region (2 × Hippocampus, 2 × Perirhinal, 2 × Amygdala). Moreover, MTL is thought to engage early in the learning process and the memory engram is thought be transferred to the striatum after consolidation. As our observations spanned multiple days, memory consolidation could have occurred already after the first session, which could also have explained our failure to observe any community-sensitivity in the MTL. Interestingly, we did observe such sensitivity in one parcel of the putamen.

Negatively community-sensitive parcels were located in domain-general cortex, including the temporal cortex (Temporal-Sup-R-650), parietal cortex (Parietal-Sup-L-499, Supramarginal-R-540, Precuneus-L570), superior frontal cortex (Sup-Frontal-L-45 and -R-69, Sup-Frontal-Orbit-77, Sup-Frontal-Med-R-248), and middle and inferior frontal cortex (Mid-Frontal-R-118, Insula-R-270 and -271, and Rolandic-Oper-L-187). Apart from Parietal-Sup-L499, none of these parcels exhibited a significant representation of object identity, further strengthening the dissociation between negative and positive community representations.

These findings are consistent with previous reports that implicit learning paradigms can engage parieto-frontal, fronto-striatal, and/or ventral attention networks (Batterink et al., 2019). More generally, prefrontal cortex (PFC) is thought to reflect higher-order statistics of event (Henin et al., 2021) and decision strategies adopted by observers (Giorgio et al., 2018; Karlaftis et al., 2019; Kourtzi & Welchman, 2019; R. Wang et al., 2017). Orbitofrontal cortex (OFC) is thought to be engaged when more abstract representations or ‘cognitive maps’ are required (Behrens et al., 2018; Christophel et al., 2017; Knudsen & Wallis, 2022; Rusu & Pennartz, 2020; Schuck et al., 2016; Wilson et al., 2014). Insula and inferior frontal gyrus are thought to be engaged by working memory tasks, especially under conditions of high load (Rottschy et al., 2012), and to contribute to goal-directed behavior by interacting with the medial temporal lobe hippocampus (Rusu & Pennartz, 2020). Moreover, when objects are viewed in temporally structured sequences, responses in insula and inferior frontal gyrus are suppressed for expected objects (Ferrari et al., 2022). Interestingly, this ‘expectation suppression’ arises earlier than in the occipitotemporal visual areas (see also Weilnhammer et al., 2021).

The negative community-sensitivity observed both here and in previous studies (A. C. Schapiro et al., 2013) is consistent with “context-specific maps” that individuate objects in a given community, without necessarily identifying either the community or objects in other communities. When the context changes, such a map could be reused to individuate objects in the new community. This would be similar to the invariant response patterns in different environments exhibited by grid-cells (Constantinescu et al., 2016; Doeller et al., 2010; Fyhn et al., 2007).

In summary, our results demonstrate incidental learning of temporal associations at all levels of the ventral visual pathway—from the primary visual cortex to the anterior inferior temporal cortex—at the time-scales of both object presentations (seconds) and of temporal contingencies in the object sequence (tens of seconds). This functional overlap suggests that the visual hierarchy develops convergent representations (Grill-Spector & Weiner, 2014) that integrate information from a range of time-scales. It seems likely that such convergent representations contribute to context-dependent enhancement of recognition performance. Our findings confirm the classical observation of a conjoint development of representations of object identity and temporal association (Erickson & Desimone, 1999; Miyashita, 1988).

In the domain-general cortex—superior temporal, parietal, frontal, and insular—representations of higher-order temporal context were also evident, but without any stable representations of object identity. Particularly the ‘linking objects’ that separated different temporal communities in structured presentation sequences tended to be represented distinctly. Thus, our finding suggests that both the ventral occipitotemporal cortex and/or domain-general cortex could be in a position to contribute to “structural learning” (Kemp & Tenenbaum, 2008; Tenenbaum et al., 2011) and the development of causal insight and understanding (Lake et al., 2017; Shafto et al., 2011).

Direct linear discriminant analysis and prevalence inference is available on github.com/cognitive-biology/DLDA. MR data will be made available upon request.

Ehsan Kakaei: Conceptualization, data curation, formal analysis, visualization, and writing of original draft. Jochen Braun: Conceptualization, linear algebra, formal analysis, supervision, and reviewing & editing.

The authors are not aware of any competing interest.

We thank Claus Tempelmann, Martin Kanowski, and Denise Scheermann at the Magnetic Resonance Imaging Laboratory of the Department of Neurology of Otto-von-Guericke University, Magdeburg. We are grateful to Oliver Speck for providing essential support and balanced perspective. We also thank Stepan Aleshin for helpful discussions and constructive comments. This study was funded by the federal state Saxony-Anhalt and the European Structural and Investment Funds (ESF, 2014-2020), project number ZS/2016/08/80645, as part of doctoral program ABINEP (Analysis, Imaging and Modelling of Neuronal Processes).

Supplementary material for this article is available with the online version here: https://doi.org/10.1162/imag_a_00278

Albers
,
K. J.
,
Ambrosen
,
K. S.
,
Liptrot
,
M. G.
,
Dyrby
,
T. B.
,
Schmidt
,
M. N.
, &
Mørup
,
M.
(
2021
).
Using connectomics for predictive assessment of brain parcellations
.
Neuroimage
,
238
,
118170
. https://doi.org/10.1016/j.neuroimage.2021.118170
Alink
,
A.
,
Walther
,
A.
,
Krugliak
,
A.
,
van den Bosch
,
J. J.
, &
Kriegeskorte
,
N.
(
2015
).
Mind the drift-improving sensitivity to fMRI pattern information by accounting for temporal pattern drift
.
BioRxiv
,
032391
. https://doi.org/10.1101/032391
Allefeld
,
C.
,
Görgen
,
K.
, &
Haynes
,
J.-D.
(
2016
).
Valid population inference for information-based imaging: From the second-level t-test to prevalence inference
.
Neuroimage
,
141
,
378
392
. https://doi.org/10.1016/j.neuroimage.2016.07.040
Aslin
,
R. N.
(
2017
).
Statistical learning: A powerful mechanism that operates by mere exposure
.
Cogn Sci
,
8
(
1–2
),
e1373
. https://doi.org/10.1002/wcs.1373
Batterink
,
L. J.
,
Paller
,
K. A.
, &
Reber
,
P. J.
(
2019
).
Understanding the neural bases of implicit and statistical learning
.
Top Cogn Sci
,
11
(
3
),
482
503
. https://doi.org/10.1111/tops.12420
Beckmann
,
C. F.
, &
Smith
,
S. M.
(
2004
).
Probabilistic independent component analysis for functional magnetic resonance imaging
.
IEEE Trans Med Imaging
,
23
(
2
),
137
152
. https://doi.org/10.1109/TMI.2003.822821
Behrens
,
T. E.
,
Muller
,
T. H.
,
Whittington
,
J. C.
,
Mark
,
S.
,
Baram
,
A. B.
,
Stachenfeld
,
K. L.
, &
Kurth-Nelson
,
Z.
(
2018
).
What is a cognitive map? Organizing knowledge for flexible behavior
.
Neuron
,
100
(
2
),
490
509
. https://doi.org/10.1016/j.neuron.2018.10.002
Benjamini
,
Y.
, &
Hochberg
,
Y.
(
1995
).
Controlling the false discovery rate: A practical and powerful approach to multiple testing
.
J Roy Statist Soc Ser B
,
57
(
1
),
289
300
. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
Bi
,
Y.
,
Wang
,
X.
, &
Caramazza
,
A.
(
2016
).
Object domain and modality in the ventral visual pathway
.
Trends Cognit Sci
,
20
(
4
),
282
290
. https://doi.org/10.1016/j.tics.2016.02.002
Brainard
,
D. H.
(
1997
).
The psychophysics toolbox
.
Spatial Vision
,
10
(
4
),
433
436
. https://doi.org/10.1163/156856897X00357
Brants
,
M.
,
Bulthé
,
J.
,
Daniels
,
N.
,
Wagemans
,
J.
, &
Beeck
Op de
,
P
H.
(
2016
).
How learning might strengthen existing visual object representations in human object-selective cortex
.
Neuroimage
,
127
,
74
85
. https://doi.org/10.1016/j.neuroimage.2015.11.063
Cai
,
M. B.
,
Schuck
,
N. W.
,
Pillow
,
J. W.
, &
Niv
,
Y.
(
2019
).
Representational structure or task structure? Bias in neural representational similarity analysis and a Bayesian method for reducing bias
.
PLoS Comput Biol
,
15
(
5
),
e1006299
. https://doi.org/10.1371/journal.pcbi.1006299
Christophel
,
T. B.
,
Klink
,
P. C.
,
Spitzer
,
B.
,
Roelfsema
,
P. R.
, &
Haynes
,
J.-D.
(
2017
).
The distributed nature of working memory
.
Trends Cognit Sci
,
21
(
2
),
111
124
. https://doi.org/10.1016/j.tics.2016.12.007
Chun
,
M. M.
, &
Jiang
,
Y.
(
1998
).
Contextual cueing: Implicit learning and memory of visual context guides spatial attention
.
Cogn Psychol
,
36
(
1
),
28
71
. https://doi.org/10.1006/cogp.1998.0681
Constantinescu
,
A. O.
,
O’Reilly
,
J. X.
, &
Behrens
,
T. E.
(
2016
).
Organizing conceptual knowledge in humans with a gridlike code
.
Science
,
352
(
6292
),
1464
1468
. https://doi.org/10.1126/science.aaf0941
Conway
,
C. M.
, &
Christiansen
,
M. H.
(
2005
).
Modality-constrained statistical learning of tactile, visual, and auditory sequences
.
J Exp Psy Learn Mem Cognit
,
31
(
1
),
24
39
. https://doi.org/10.1037/0278-7393.31.1.24
Cox
,
D. D.
,
Meier
,
P.
,
Oertelt
,
N.
, &
DiCarlo
,
J. J.
(
2005
).
‘Breaking’ position-invariant object recognition
.
Nat Neurosci
,
8
(
9
),
1145
1147
. https://doi.org/10.1038/nn1519
Davachi
,
L.
, &
DuBrow
,
S.
(
2015
).
How the hippocampus preserves order: The role of prediction and context
.
Trends Cogn Sci
,
19
(
2
),
92
99
. https://doi.org/10.1016/j.tics.2014.12.004
DiCarlo
,
J. J.
,
Zoccolan
,
D.
, &
Rust
,
N. C.
(
2012
).
How does the brain solve visual object recognition
?
Neuron
,
73
(
3
),
415
434
. https://doi.org/10.1016/j.neuron.2012.01.010
Doeller
,
C. F.
,
Barry
,
C.
, &
Burgess
,
N.
(
2010
).
Evidence for grid cells in a human memory network
.
Nature
,
463
(
7281
),
657
661
. https://doi.org/10.1038/nature08704
Dornas
,
J. V.
, &
Braun
,
J.
(
2018
).
Finer parcellation reveals detailed correlational structure of resting-state fMRI signals
.
J Neurosci Meth
,
294
,
15
33
. https://doi.org/10.1016/j.jneumeth.2017.10.020
Durrant
,
S. J.
,
Cairney
,
S. A.
, &
Lewis
,
P. A.
(
2013
).
Overnight consolidation aids the transfer of statistical knowledge from the medial temporal lobe to the striatum
.
Cerebral Cortex
,
23
(
10
),
2467
2478
. https://doi.org/10.1093/cercor/bhs244
Eger
,
E.
,
Ashburner
,
J.
,
Haynes
,
J.-D.
,
Dolan
,
R. J.
, &
Rees
,
G.
(
2008
).
fMRI activity patterns in human loc carry information about object exemplars within category
.
J Cogn Neurosci
,
20
(
2
),
356
370
. https://doi.org/10.1162/jocn.2008.20019
Eichenbaum
,
H.
,
Amaral
,
D. G.
,
Buffalo
,
E. A.
,
Buzsáki
,
G.
,
Cohen
,
N.
,
Davachi
,
L.
,
Frank
,
L.
,
Heckers
,
S.
,
Morris
,
R. G.
,
Moser
,
E. I.
,
Nadel
,
L.
,
O’Keefe
,
J.
,
Preston
,
A.
,
Ranganath
,
C.
,
Silva
,
A.
,
Witter
,
M.
(
2016
).
Hippocampus at 25
.
Hippocampus
,
26
(
10
),
1238
1249
. https://doi.org/10.1002/hipo.22616
Erickson
,
C. A.
, &
Desimone
,
R.
(
1999
).
Responses of macaque perirhinal neurons during and after visual stimulus association learning
.
J Neurosci
,
19
(
23
),
10404
10416
. https://doi.org/10.1523/JNEUROSCI.19-23-10404.1999
Ferrari
,
A.
,
Richter
,
D.
, &
Lange
de
,
P
F.
(
2022
).
Updating contextual sensory expectations for adaptive behavior
.
J Neurosci
,
42
(
47
),
8855
8869
. https://doi.org/10.1523/JNEUROSCI.1107-22.2022
Fiser
,
J.
, &
Aslin
,
R. N.
(
2001
).
Unsupervised statistical learning of higher-order spatial structures from visual scenes
.
Psychol Sci
,
12
(
6
),
499
504
. https://doi.org/10.1111/1467-9280.00392
Fiser
,
J.
, &
Aslin
,
R. N.
(
2002
).
Statistical learning of higher-order temporal structure from visual shape sequences
.
J Exp Psychol Learn Mem Cognit
,
28
(
3
),
458
467
. https://doi.org/10.1037/0278-7393.28.3.458
Fiser
,
J.
, &
Aslin
,
R. N.
(
2005
).
Encoding multielement scenes: Statistical learning of visual feature hierarchies
.
J Exp Psychol Gen
,
134
(
4
),
521
537
. https://doi.org/10.1037/0096-3445.134.4.521
Fiser
,
J.
, &
Lengyel
,
G.
(
2022
).
Statistical learning in vision
.
Annu Rev Vis Sci
,
8
,
265
290
. https://doi.org/10.1146/annurev-vision-100720-103343
Freud
,
E.
,
Culham
,
J. C.
,
Plaut
,
D. C.
, &
Behrmann
,
M.
(
2017
).
The large-scale organization of shape processing in the ventral and dorsal pathways
.
eLife
,
6
,
e27576
. https://doi.org/10.7554/eLife.34464
Friston
,
K. J.
,
Jezzard
,
P.
, &
Turner
,
R.
(
1994
).
Analysis of functional MRI time-series
.
Hum Brain Mapp
,
1
(
2
),
153
171
. https://doi.org/10.1002/hbm.460010207
Fyhn
,
M.
,
Hafting
,
T.
,
Treves
,
A.
,
Moser
,
M.-B.
, &
Moser
,
E. I.
(
2007
).
Hippocampal remapping and grid realignment in entorhinal cortex
.
Nature
,
446
(
7132
),
190
194
. https://doi.org/10.1038/nature05601
Gauthier
,
I.
, &
Tarr
,
M. J.
(
2016
).
Visual object recognition: Do we (finally) know more now than we did?
Annu Rev Vis Sci
,
2
,
377
396
. https://doi.org/10.1146/annurev-vision-111815-114621
Gheysen
,
F.
,
Van Opstal
,
F.
,
Roggeman
,
C.
,
Van Waelvelde
,
H.
, &
Fias
,
W.
(
2011
).
The neural basis of implicit perceptual sequence learning
.
Front Hum Neurosci
,
5
,
137
. https://doi.org/10.3389/fnhum.2011.00137
Gilron
,
R.
,
Rosenblatt
,
J. D.
, &
Mukamel
,
R.
(
2016
).
Addressing the “problem” of temporal correlations in MVPA analysis
. In
2016 International Workshop on Pattern Recognition in Neuroimaging (PRNI), Trento, Italy
(pp.
1
4
).
IEEE
. https://doi.org/10.1109/PRNI.2016.7552348
Giorgio
,
J.
,
Karlaftis
,
V. M.
,
Wang
,
R.
,
Shen
,
Y.
,
Tino
,
P.
,
Welchman
,
A.
, &
Kourtzi
,
Z.
(
2018
).
Functional brain networks for learning predictive statistics
.
Cortex
,
107
,
204
219
. https://doi.org/10.1016/j.cortex.2017.08.014
Greve
,
D. N.
, &
Fischl
,
B.
(
2009
).
Accurate and robust brain image alignment using boundary-based registration
.
Neuroimage
,
48
(
1
),
63
72
. https://doi.org/10.1016/j.neuroimage.2009.06.060
Grill-Spector
,
K.
,
Kourtzi
,
Z.
, &
Kanwisher
,
N.
(
2001
).
The lateral occipital complex and its role in object recognition
.
Vision Res
,
41
(
10–11
),
1409
1422
. https://doi.org/10.1016/S0042-6989(01)00073-6
Grill-Spector
,
K.
, &
Weiner
,
K. S.
(
2014
).
The functional architecture of the ventral temporal cortex and its role in categorization
.
Nat Rev Neurosci
,
15
(
8
),
536
548
. https://doi.org/10.1038/nrn3747
Haxby
,
J. V.
(
2012
).
Multivariate pattern analysis of fMRI: The early beginnings
.
Neuroimage
,
62
(
2
),
852
855
. https://doi.org/10.1016/j.neuroimage.2012.03.016
Henin
,
S.
,
Turk-Browne
,
N. B.
,
Friedman
,
D.
,
Liu
,
A.
,
Dugan
,
P.
,
Flinker
,
A.
,
Doyle
,
W.
,
Devinsky
,
O.
, &
Melloni
,
L.
(
2021
).
Learning hierarchical sequence representations across human cortex and hippocampus
.
Sci Adv
,
7
(
8
),
eabc4530
. https://doi.org/10.1126/sciadv.abc4530
Henriksson
,
L.
,
Khaligh-Razavi
,
S.-M.
,
Kay
,
K.
, &
Kriegeskorte
,
N.
(
2015
).
Visual representations are dominated by intrinsic fluctuations correlated between areas
.
Neuroimage
,
114
,
275
286
. https://doi.org/10.1016/j.neuroimage.2015.04.026
Hindy
,
N. C.
,
Ng
,
F. Y.
, &
Turk-Browne
,
N. B.
(
2016
).
Linking pattern completion in the hippocampus to predictive coding in visual cortex
.
Nat Neurosci
,
19
(
5
),
665
667
. https://doi.org/10.1038/nn.4284
Hirabayashi
,
T.
, &
Miyashita
,
Y.
(
2014
).
Computational principles of microcircuits for visual object processing in the macaque temporal cortex
.
Trends Neurosci
,
37
(
3
),
178
187
. https://doi.org/10.1016/j.tins.2014.01.002
Hsieh
,
L.-T.
,
Gruber
,
M. J.
,
Jenkins
,
L. J.
, &
Ranganath
,
C.
(
2014
).
Hippocampal activity patterns carry information about objects in temporal context
.
Neuron
,
81
(
5
),
1165
1178
. https://doi.org/10.1016/j.neuron.2014.01.015
Hunt
,
R. H.
, &
Aslin
,
R. N.
(
2001
).
Statistical learning in a serial reaction time task: Access to separable statistical cues by individual learners
.
J Exp Psychol Gen
,
130
(
4
),
658
680
. https://doi.org/10.1037/0096-3445.130.4.658
Jenkinson
,
M.
,
Bannister
,
P.
,
Brady
,
M.
, &
Smith
,
S.
(
2002
).
Improved optimization for the robust and accurate linear registration and motion correction of brain images
.
Neuroimage
,
17
(
2
),
825
841
. https://doi.org/10.1006/nimg.2002.1132
Jenkinson
,
M.
, &
Smith
,
S.
(
2001
).
A global optimisation method for robust affine registration of brain images
.
Med Image Anal
,
5
(
2
),
143
156
. https://doi.org/10.1016/S1361-8415(01)00036-6
Jeong
,
S. K.
, &
Xu
,
Y.
(
2016
).
Behaviorally relevant abstract object identity representation in the human parietal cortex
.
J Neurosci
,
36
(
5
),
1607
1619
. https://doi.org/10.1523/JNEUROSCI.1016-15.2016
Jia
,
X.
,
Hong
,
H.
, &
DiCarlo
,
J. J.
(
2021
).
Unsupervised changes in core object recognition behavior are predicted by neural plasticity in inferior temporal cortex
.
eLife
,
10
,
e60830
. https://doi.org/10.7554/eLife.60830
Jiang
,
Y.
, &
Wagner
,
L. C.
(
2004
).
What is learned in spatial contextual cuing—Configuration or individual locations?
Percept Psychophys
,
66
,
454
463
. https://doi.org/10.3758/BF03194893
Kahn
,
A. E.
,
Karuza
,
E. A.
,
Vettel
,
J. M.
, &
Bassett
,
D. S.
(
2018
).
Network constraints on learnability of probabilistic motor sequences
.
Nat Hum Behav
,
2
(
12
),
936
947
. https://doi.org/10.1038/s41562-018-0463-8
Kakaei
,
E.
,
Aleshin
,
S.
, &
Braun
,
J.
(
2021
).
Visual object recognition is facilitated by temporal community structure
.
Learn Mem
,
28
(
5
),
148
152
. https://doi.org/10.1101/lm.053306.120
Kakaei
,
E.
, &
Braun
,
J.
(
2024
).
Gradual change of cortical representations with growing visual expertise for synthetic shapes
.
Imaging Neurosci
,
2
,
1
28
. https://doi.org/10.1162/imag_a_00255
Kaposvari
,
P.
,
Kumar
,
S.
, &
Vogels
,
R.
(
2018
).
Statistical learning signals in macaque inferior temporal cortex
.
Cerebral Cortex
,
28
(
1
),
250
266
. https://doi.org/10.1093/cercor/bhw374
Karlaftis
,
V. M.
,
Giorgio
,
J.
,
Vértes
,
P. E.
,
Wang
,
R.
,
Shen
,
Y.
,
Tino
,
P.
,
Welchman
,
A. E.
, &
Kourtzi
,
Z.
(
2019
).
Multimodal imaging of brain connectivity reveals predictors of individual decision strategy in statistical learning
.
Nat Hum Behav
,
3
(
3
),
297
307
. https://doi.org/10.1038/s41562-018-0503-4
Karuza
,
E. A.
,
Emberson
,
L. L.
,
Roser
,
M. E.
,
Cole
,
D.
,
Aslin
,
R. N.
, &
Fiser
,
J.
(
2017
).
Neural signatures of spatial statistical learning: Characterizing the extraction of structure from complex visual scenes
.
J Cogn Neurosci
,
29
(
12
),
1963
1976
. https://doi.org/10.1162/jocn_a_01182
Karuza
,
E. A.
,
Kahn
,
A. E.
,
Thompson-Schill
,
S. L.
, &
Bassett
,
D. S.
(
2017
).
Process reveals structure: How a network is traversed mediates expectations about its architecture
.
Sci Rep
,
7
(
1
),
1
9
. https://doi.org/10.1038/s41598-017-12876-5
Kemp
,
C.
, &
Tenenbaum
,
J. B.
(
2008
).
The discovery of structural form
.
Proc Natl Acad Sci USA
,
105
(
31
),
10687
10692
. https://doi.org/10.1073/pnas.0802631105
Knudsen
,
E. B.
, &
Wallis
,
J. D.
(
2022
).
Taking stock of value in the orbitofrontal cortex
.
Nat Rev Neurosci
,
23
(
7
),
428
438
. https://doi.org/10.1038/s41583-022-00589-2
Konen
,
C. S.
, &
Kastner
,
S.
(
2008
).
Two hierarchically organized neural systems for object information in human visual cortex
.
Nat Neurosci
,
11
(
2
),
224
231
. https://doi.org/10.1038/nn2036
Kourtzi
,
Z.
, &
Welchman
,
A. E.
(
2019
).
Learning predictive structure without a teacher: Decision strategies and brain routes
.
Curr Opin Neurobiol
,
58
,
130
134
. https://doi.org/10.1016/j.conb.2019.09.014
Kravitz
,
D. J.
,
Saleem
,
K. S.
,
Baker
,
C. I.
,
Ungerleider
,
L. G.
, &
Mishkin
,
M.
(
2013
).
The ventral visual pathway: An expanded neural framework for the processing of object quality
.
Trends Cognit Sci
,
17
(
1
),
26
49
. https://doi.org/10.1016/j.tics.2012.10.011
Kriegeskorte
,
N.
, &
Diedrichsen
,
J.
(
2019
).
Peeling the onion of brain representations
.
Annu Rev Neurosci
,
42
,
407
432
. https://doi.org/10.1146/annurev-neuro-080317-061906
Kriegeskorte
,
N.
,
Mur
,
M.
, &
Bandettini
,
P. A.
(
2008
).
Representational similarity analysis-connecting the branches of systems neuroscience
.
Front Syst Neurosci
,
2
,
4
. https://doi.org/10.3389/neuro.06.004.2008
Kumar
,
M.
,
Anderson
,
M. J.
,
Antony
,
J. W.
,
Baldassano
,
C.
,
Brooks
,
P. P.
,
Cai
,
M. B.
,
Chen
,
P.-H. C.
,
Ellis
,
C. T.
,
Henselman-Petrusek
,
G.
,
Huberdeau
,
D.
,
Hutchinson
,
J. B.
,
Li
,
Y. P.
,
Lu
,
Q.
,
Manning
,
J. R.
,
Mennen
,
A. C.
,
Nastase
,
S. A.
,
Richard
,
H.
,
Schapiro
,
A. C.
,
Schuck
,
N. W.
, …
Norman
,
K. A.
(
2022
).
BrainIAK: The brain imaging analysis kit
.
Apert Neuro
,
1
(
4
),
1
19
. https://doi.org/10.52294/31bb5b68-2184-411b-8c00-a1dacb61e1da
Lake
,
B. M.
,
Ullman
,
T. D.
,
Tenenbaum
,
J. B.
, &
Gershman
,
S. J.
(
2017
).
Building machines that learn and think like people
.
Behav Brain Sci
,
40
,
e253
. https://doi.org/10.1017/S0140525X16001837
Lengyel
,
G.
,
Žalalytė
,
G.
,
Pantelides
,
A.
,
Ingram
,
J. N.
,
Fiser
,
J.
,
Lengyel
,
M.
, &
Wolpert
,
D. M.
(
2019
).
Unimodal statistical learning produces multimodal object-like representations
.
eLife
,
8
,
e43942
. https://doi.org/10.7554/eLife.43942
Li
,
N.
, &
DiCarlo
,
J. J.
(
2008
).
Unsupervised natural experience rapidly alters invariant object representation in visual cortex
.
Science
,
321
(
5895
),
1502
1507
. https://doi.org/10.1126/science.1160028
Li
,
N.
, &
DiCarlo
,
J. J.
(
2010
).
Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex
.
Neuron
,
67
(
6
),
1062
1075
. https://doi.org/10.1016/j.neuron.2010.08.029
Li
,
N.
, &
DiCarlo
,
J. J.
(
2012
).
Neuronal learning of invariant object representation in the ventral visual stream is not dependent on reward
.
J Neurosci
,
32
(
19
),
6611
6620
. https://doi.org/10.1523/JNEUROSCI.3786-11.2012
Logothetis
,
N. K.
, &
Sheinberg
,
D. L.
(
1996
).
Visual object recognition
.
Annu Rev Neurosci
,
19
(
1
),
577
621
. https://doi.org/10.1146/annurev.ne.19.030196.003045
Messinger
,
A.
,
Squire
,
L. R.
,
Zola
,
S. M.
, &
Albright
,
T. D.
(
2001
).
Neuronal representations of stimulus associations develop in the temporal lobe during learning
.
Proc Natl Acad Sci USA
,
98
(
21
),
12239
12244
. https://doi.org/10.1073/pnas.211431098
Meyer
,
T.
,
Ramachandran
,
S.
, &
Olson
,
C. R.
(
2014
).
Statistical learning of serial visual transitions by neurons in monkey inferotemporal cortex
.
J Neurosci
,
34
(
28
),
9332
9337
. https://doi.org/10.1523/JNEUROSCI.1215-14.2014
Miyashita
,
Y.
(
1988
).
Neuronal correlate of visual associative long-term memory in the primate temporal cortex
.
Nature
,
335
(
6193
),
817
820
. https://doi.org/10.1038/335817a0
Nastase
,
S. A.
,
Gazzola
,
V.
,
Hasson
,
U.
, &
Keysers
,
C.
(
2019
).
Measuring shared responses across subjects using intersubject correlation
.
Soc Cogn Affect Neurosci
,
14
(
6
),
667
685
. https://doi.org/10.1093/scan/nsz037
Naya
,
Y.
,
Yoshida
,
M.
, &
Miyashita
,
Y.
(
2001
).
Backward spreading of memory-retrieval signal in the primate temporal cortex
.
Science
,
291
(
5504
),
661
664
. https://doi.org/10.1126/science.291.5504.661
Naya
,
Y.
,
Yoshida
,
M.
,
Takeda
,
M.
,
Fujimichi
,
R.
, &
Miyashita
,
Y.
(
2003
).
Delay-period activities in two subdivisions of monkey inferotemporal cortex during pair association memory task
.
Eur J Neurosci
,
18
(
10
),
2915
2918
. https://doi.org/10.1111/j.1460-9568.2003.03020.x
Op de Beeck
,
H. P.
, &
Baker
,
C. I.
(
2010
).
The neural basis of visual object learning
.
Trends Cogn Sci
,
14
(
1
),
22
30
. https://doi.org/10.1016/j.tics.2009.11.002
Patel
,
A. X.
,
Kundu
,
P.
,
Rubinov
,
M.
,
Jones
,
P. S.
,
Vértes
,
P. E.
,
Ersche
,
K. D.
,
Suckling
,
J.
, &
Bullmore
,
E. T.
(
2014
).
A wavelet method for modeling and despiking motion artifacts from resting-state fMRI time series
.
Neuroimage
,
95
,
287
304
. https://doi.org/10.1016/j.neuroimage.2014.03.012
Perruchet
,
P.
(
2019
).
Dual nature of anticipatory classically conditioned reactions
. In
S.
Kornblum
&
J.
Requin
(Eds.),
Preparatory states and processes
(pp.
179
198
).
Psychology Press
. https://doi.org/10.4324/9781315792385-9
Perruchet
,
P.
, &
Pacton
,
S.
(
2006
).
Implicit learning and statistical learning: One phenomenon, two approaches
.
Trends Cognit Sci
,
10
(
5
),
233
238
. https://doi.org/10.1016/j.tics.2006.03.006
Poirier
,
C. C.
,
De Volder
,
A. G.
,
Tranduy
,
D.
, &
Scheiber
,
C.
(
2006
).
Neural changes in the ventral and dorsal visual streams during pattern recognition learning
.
Neurobiol Learn Mem
,
85
(
1
),
36
43
. https://doi.org/10.1016/j.nlm.2005.08.006
Rosenthal
,
C. R.
,
Andrews
,
S. K.
,
Antoniades
,
C. A.
,
Kennard
,
C.
, &
Soto
,
D.
(
2016
).
Learning and recognition of a non-conscious sequence of events in human primary visual cortex
.
Curr Biol
,
26
(
6
),
834
841
. https://doi.org/10.1016/j.cub.2016.01.040
Rottschy
,
C.
,
Langner
,
R.
,
Dogan
,
I.
,
Reetz
,
K.
,
Laird
,
A. R.
,
Schulz
,
J. B.
,
Fox
,
P. T.
, &
Eickhoff
,
S. B.
(
2012
).
Modelling neural correlates of working memory: A coordinate-based meta-analysis
.
Neuroimage
,
60
(
1
),
830
846
. https://doi.org/10.1016/j.neuroimage.2011.11.050
Rusu
,
S. I.
, &
Pennartz
,
C. M.
(
2020
).
Learning, memory and consolidation mechanisms for behavioral control in hierarchically organized cortico-basal ganglia systems
.
Hippocampus
,
30
(
1
),
73
98
. https://doi.org/10.1002/hipo.23167
Saffran
,
J. R.
,
Aslin
,
R. N.
, &
Newport
,
E. L.
(
1996
).
Statistical learning by 8-month-old infants
.
Science
,
274
(
5294
),
1926
1928
. https://doi.org/10.1126/science.274.5294.1926
Saffran
,
J. R.
, &
Kirkham
,
N. Z.
(
2018
).
Infant statistical learning
.
Annu Rev Psychol
,
69
,
181
. https://doi.org/10.1146/annurev-psych-122216-011805
Sakai
,
K.
, &
Miyashita
,
Y.
(
1991
).
Neural organization for the long-term memory of paired associates
.
Nature
,
354
(
6349
),
152
155
. https://doi.org/10.1038/354152a0
Sáringer
,
S.
,
Fehér
,
Á.
,
Sáry
,
G.
, &
Kaposvári
,
P.
(
2022
).
Online measurement of learning temporal statistical structure in categorization tasks
.
Mem Cogn
,
50
(
7
),
1530
1545
. https://doi.org/10.3758/s13421-022-01302-5
Sáry
,
G.
,
Vogels
,
R.
, &
Orban
,
G. A.
(
1993
).
Cue-invariant shape selectivity of macaque inferior temporal neurons
.
Science
,
260
(
5110
),
995
997
. https://doi.org/10.1126/science.8493538
Schapiro
,
A.
, &
Turk-Browne
,
N.
(
2015
).
Statistical learning
.
Brain Mapp
,
3
,
501
506
. https://doi.org/10.1016/B978-0-12-397025-1.00276-1
Schapiro
,
A. C.
,
Kustner
,
L. V.
, &
Turk-Browne
,
N. B.
(
2012
).
Shaping of object representations in the human medial temporal lobe based on temporal regularities
.
Curr Biol
,
22
(
17
),
1622
1627
. https://doi.org/10.1016/j.cub.2012.06.056
Schapiro
,
A. C.
,
Rogers
,
T. T.
,
Cordova
,
N. I.
,
Turk-Browne
,
N. B.
, &
Botvinick
,
M. M.
(
2013
).
Neural representations of events arise from temporal community structure
.
Nat Neurosci
,
16
(
4
),
486
492
. https://doi.org/10.1038/nn.3331
Schapiro
,
A. C.
,
Turk-Browne
,
N. B.
,
Norman
,
K. A.
, &
Botvinick
,
M. M.
(
2016
).
Statistical learning of temporal community structure in the hippocampus
.
Hippocampus
,
26
(
1
),
3
8
. https://doi.org/10.1002/hipo.22523
Schendan
,
H. E.
,
Searl
,
M. M.
,
Melrose
,
R. J.
, &
Stern
,
C. E.
(
2003
).
An fMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning
.
Neuron
,
37
(
6
),
1013
1025
. https://doi.org/10.1016/s0896-6273(03)00123-5
Schuck
,
N. W.
,
Cai
,
M. B.
,
Wilson
,
R. C.
, &
Niv
,
Y.
(
2016
).
Human orbitofrontal cortex represents a cognitive map of state space
.
Neuron
,
91
(
6
),
1402
1412
. https://doi.org/10.1016/j.neuron.2016.08.019
Shafto
,
P.
,
Kemp
,
C.
,
Mansinghka
,
V.
, &
Tenenbaum
,
J. B.
(
2011
).
A probabilistic model of cross-categorization
.
Cognition
,
120
(
1
),
1
25
. https://doi.org/10.1016/j.cognition.2011.02.010
Smith
,
S. M.
(
2002
).
Fast robust automated brain extraction
.
Hum Brain Mapp
,
17
(
3
),
143
155
. https://doi.org/10.1002/hbm.10062
Smith
,
S. M.
, &
Brady
,
J. M.
(
1997
).
Susan—a new approach to low level image processing
.
Int J Comput Vis
,
23
(
1
),
45
78
. https://doi.org/10.1023/A:1007963824710
Tamura
,
K.
,
Takeda
,
M.
,
Setsuie
,
R.
,
Tsubota
,
T.
,
Hirabayashi
,
T.
,
Miyamoto
,
K.
, &
Miyashita
,
Y.
(
2017
).
Conversion of object identity to object-general semantic value in the primate temporal cortex
.
Science
,
357
(
6352
),
687
692
. https://doi.org/10.1126/science.aan4800
Tenenbaum
,
J. B.
,
Kemp
,
C.
,
Griffiths
,
T. L.
, &
Goodman
,
N. D.
(
2011
).
How to grow a mind: Statistics, structure, and abstraction
.
Science
,
331
(
6022
),
1279
1285
. https://doi.org/10.1126/science.1192788
Tian
,
M.
, &
Grill-Spector
,
K.
(
2015
).
Spatiotemporal information during unsupervised learning enhances viewpoint invariant object recognition
.
J Vis
,
15
(
6
),
7
. https://doi.org/10.1167/15.6.7
Turk-Browne
,
N. B.
,
Isola
,
P. J.
,
Scholl
,
B. J.
, &
Treat
,
T. A.
(
2008
).
Multidimensional visual statistical learning
.
J Exp Psychol Learn Mem Cogn
,
34
(
2
),
399
407
. https://doi.org/10.1037/0278-7393.34.2.399
Turk-Browne
,
N. B.
,
Jungé
,
J. A.
, &
Scholl
,
B. J.
(
2005
).
The automaticity of visual statistical learning
.
J Exp Psychol Gen
,
134
(
4
),
552
564
. https://doi.org/10.1037/0096-3445.134.4.552
Turk-Browne
,
N. B.
,
Scholl
,
B. J.
,
Chun
,
M. M.
, &
Johnson
,
M. K.
(
2009
).
Neural evidence of statistical learning: Efficient detection of visual regularities without awareness
.
J Cogn Neurosci
,
21
(
10
),
1934
1945
. https://doi.org/10.1162/jocn.2009.21131
Turk-Browne
,
N. B.
,
Scholl
,
B. J.
,
Johnson
,
M. K.
, &
Chun
,
M. M.
(
2010
).
Implicit perceptual anticipation triggered by statistical learning
.
J Neurosci
,
30
(
33
),
11177
11187
. https://doi.org/10.1523/JNEUROSCI.0858-10.2010
Tzourio-Mazoyer
,
N.
,
Landeau
,
B.
,
Papathanassiou
,
D.
,
Crivello
,
F.
,
Etard
,
O.
,
Delcroix
,
N.
,
Mazoyer
,
B.
, &
Joliot
,
M.
(
2002
).
Automated anatomical labeling of activations in spm using a macroscopic anatomical parcellation of the mni MRI single-subject brain
.
Neuroimage
,
15
(
1
),
273
289
. https://doi.org/10.1006/nimg.2001.0978
Van Meel
,
C.
, &
Op de Beeck
,
H. P.
(
2018
).
Temporal contiguity training influences behavioral and neural measures of viewpoint tolerance
.
Front Hum Neurosci
,
12
,
13
. https://doi.org/10.3389/fnhum.2018.00013
Van Meel
,
C.
, &
Op de Beeck
,
H. P.
(
2020
).
An investigation of the effect of temporal contiguity training on size-tolerant representations in object-selective cortex
.
Neuroimage
,
217
,
116881
. https://doi.org/10.1016/j.neuroimage.2020.116881
Visconti di Oleggio Castello
,
M.
,
Haxby
,
J. V.
, &
Gobbini
,
M. I.
(
2021
).
Shared neural codes for visual and semantic information about familiar faces in a common representational space
.
Proc Natl Acad Sci USA
,
118
(
45
),
e2110474118
. https://doi.org/10.1073/pnas.2110474118
Wallis
,
G.
,
Backus
,
B. T.
,
Langer
,
M.
,
Huebner
,
G.
, &
Bülthoff
,
H.
(
2009
).
Learning illumination-and orientation-invariant representations of objects through temporal association
.
J Vis
,
9
(
7
),
6
. https://doi.org/10.1167/9.7.6
Wallis
,
G.
, &
Bülthoff
,
H. H.
(
2001
).
Effects of temporal association on recognition memory
.
Proc Natl Acad Sci USA
,
98
(
8
),
4800
4804
. https://doi.org/10.1073/pnas.071028598
Wang
,
L.
,
Mruczek
,
R. E.
,
Arcaro
,
M. J.
, &
Kastner
,
S.
(
2015
).
Probabilistic maps of visual topography in human cortex
.
Cerebral Cortex
,
25
(
10
),
3911
3931
. https://doi.org/10.1093/cercor/bhu277
Wang
,
R.
,
Shen
,
Y.
,
Tino
,
P.
,
Welchman
,
A. E.
, &
Kourtzi
,
Z.
(
2017
).
Learning predictive statistics: Strategies and brain mechanisms
.
J Neurosci
,
37
(
35
),
8412
8427
. https://doi.org/10.1523/JNEUROSCI.0144-17.2017
Weilnhammer
,
V.
,
Fritsch
,
M.
,
Chikermane
,
M.
,
Eckert
,
A.-L.
,
Kanthak
,
K.
,
Stuke
,
H.
,
Kaminski
,
J.
, &
Sterzer
,
P.
(
2021
).
An active role of inferior frontal cortex in conscious experience
.
Curr Biol
,
31
(
13
),
2868.e8
2880.e8
. https://doi.org/10.1016/j.cub.2021.04.043
Weiner
,
K. S.
, &
Zilles
,
K.
(
2016
).
The anatomical and functional specialization of the fusiform gyrus
.
Neuropsychologia
,
83
,
48
62
. https://doi.org/10.1016/j.neuropsychologia.2015.06.033
Wilson
,
R. C.
,
Takahashi
,
Y. K.
,
Schoenbaum
,
G.
, &
Niv
,
Y.
(
2014
).
Orbitofrontal cortex as a cognitive map of task space
.
Neuron
,
81
(
2
),
267
279
. https://doi.org/10.1016/j.neuron.2013.11.005
Ye
,
J.
,
Xiong
,
T.
, &
Madigan
,
D.
(
2006
).
Computational and theoretical analysis of null space and orthogonal linear discriminant analysis
.
J Mach Learn Res
, 7(43),
1183
1204
. http://jmlr.org/papers/v7/ye06a.html
Yu
,
H.
, &
Yang
,
J.
(
2001
).
A direct LDA algorithm for high-dimensional data—With application to face recognition
.
Pattern Recogn
,
34
(
10
),
2067
2070
. https://doi.org/10.1016/S0031-3203(00)00162-X
Zarahn
,
E.
,
Aguirre
,
G. K.
, &
D’Esposito
,
M.
(
1997
).
Empirical analyses of BOLD fMRI statistics
.
Neuroimage
,
5
(
3
),
179
197
. https://doi.org/10.1006/nimg.1997.0263
Zhang
,
Y.
,
Brady
,
M.
, &
Smith
,
S.
(
2001
).
Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm
.
IEEE Trans Med Imag
,
20
(
1
),
45
57
. https://doi.org/10.1109/42.906424

Appendix

Appendix Table A1.

List of community-selective parcels and their anatomical region.

RegionParcelMNICommunityIdentityTopog.
 No. tBW α(%) assign. 
Middle frontal 109 48 -3 56 2.9 
Calcarine 337 -12 -99 -5 5.2 13.8 V1d 
 344 -85 5.2 13.8 V1v 
Lingual 363 -12 -65 -5 3.8 9.5 V3v 
Occipital 400 -38 -86 12.0 LO2 
(middle) 405 -16 -100 6.3 14.0 V2d 
Occipital 424 -31 -83 -8 4.5 12.6 
(inferior) 428 36 -85 -7 5.9 13.2 hV4 
 430 42 -73 -9 3.6 11.4 hV4 
Fusiform 432 -27 -71 -11 4.9 12.2 VO2 
 436 -31 -53 -13 4.1 8.8 PHC1 
Postcentral 483 51 -26 47 3.8 
Temporal 751 51 -67 -8 5.1 AIT 
(inferior) 755 46 -53 -11 5.5 9.7 AIT 
Superior frontal 45 -23 58 23 -4.1 
 69 21 63 -6.4 
Superior frontal 77 23 61 -5 -6.8 
(orbital)        
Middle frontal 118 44 35 34 -3.8 
Rolandic 187 -42 -25 18 -3.2 
operculum        
Superior frontal 248 65 -9 -4.6 
(medial orbital)        
Insula 270 39 20 -4 -3.6 
 271 41 -3.8 
Parietal 499 -28 -69 50 -4.2 8.9 
(superior)        
Supramarginal 540 56 -44 29 -4.4 
Precuneus 570 -3 -62 46 -3.5 
Putamen 619 26 11 -4.2 
Temporal 650 63 -8 -3.5 
(superior)        
RegionParcelMNICommunityIdentityTopog.
 No. tBW α(%) assign. 
Middle frontal 109 48 -3 56 2.9 
Calcarine 337 -12 -99 -5 5.2 13.8 V1d 
 344 -85 5.2 13.8 V1v 
Lingual 363 -12 -65 -5 3.8 9.5 V3v 
Occipital 400 -38 -86 12.0 LO2 
(middle) 405 -16 -100 6.3 14.0 V2d 
Occipital 424 -31 -83 -8 4.5 12.6 
(inferior) 428 36 -85 -7 5.9 13.2 hV4 
 430 42 -73 -9 3.6 11.4 hV4 
Fusiform 432 -27 -71 -11 4.9 12.2 VO2 
 436 -31 -53 -13 4.1 8.8 PHC1 
Postcentral 483 51 -26 47 3.8 
Temporal 751 51 -67 -8 5.1 AIT 
(inferior) 755 46 -53 -11 5.5 9.7 AIT 
Superior frontal 45 -23 58 23 -4.1 
 69 21 63 -6.4 
Superior frontal 77 23 61 -5 -6.8 
(orbital)        
Middle frontal 118 44 35 34 -3.8 
Rolandic 187 -42 -25 18 -3.2 
operculum        
Superior frontal 248 65 -9 -4.6 
(medial orbital)        
Insula 270 39 20 -4 -3.6 
 271 41 -3.8 
Parietal 499 -28 -69 50 -4.2 8.9 
(superior)        
Supramarginal 540 56 -44 29 -4.4 
Precuneus 570 -3 -62 46 -3.5 
Putamen 619 26 11 -4.2 
Temporal 650 63 -8 -3.5 
(superior)        

Numerical parcel ID, geometrical centroid x/y/z in MNI, between-community separability tBW, identity classification α, and topographical assignment, if any.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.

Supplementary data