Language comprehension involves the grouping of words into larger multiword chunks. This is required to recode information into sparser representations to mitigate memory limitations and counteract forgetting. It has been suggested that electrophysiological processing time windows constrain the formation of these units. Specifically, the period of rhythmic neural activity (i.e., low-frequency neural oscillations) may set an upper limit of 2–3 sec. Here, we assess whether learning of new multiword chunks is also affected by this neural limit. We applied an auditory statistical learning paradigm of an artificial language while manipulating the duration of to-be-learned chunks. Participants listened to isochronous sequences of disyllabic pseudowords from which they could learn hidden three-word chunks based on transitional probabilities. We presented chunks of 1.95, 2.55, and 3.15 sec that were created by varying the pause interval between pseudowords. In a first behavioral experiment, we tested learning using an implicit target detection task. We found better learning for chunks of 2.55 sec as compared to longer durations in line with an upper limit of the proposed time constraint. In a second experiment, we recorded participants' electroencephalogram during the exposure phase to use frequency tagging as a neural index of statistical learning. Extending the behavioral findings, results show a significant decline in neural tracking for chunks exceeding 3 sec as compared to both shorter durations. Overall, we suggest that language learning is constrained by endogenous time constraints, possibly reflecting electrophysiological processing windows.

Statistical learning describes the ability to detect structures from probability distributions in the environment. Infants, adults, and animals were found to learn regular patterns (for reviews, see Frost, Armstrong, & Christiansen, 2019; Santolin & Saffran, 2018). In humans, statistical learning is essential for language acquisition (Saffran, 2003; Saffran, Aslin, & Newport, 1996). For instance, learners extract word boundaries from transitional probabilities (TPs) between syllables. Yet, sentence-level language comprehension not only requires segmentation of continuous speech into words but also involves the grouping of these words into larger multiword chunks. This is necessary as individual words easily fade from memory because of interference from new input. Recoding them into multiword chunks allows for a compressed, more durable representation that can be passed on to a higher level of linguistic abstraction for further processing (Christiansen & Chater, 2016). Thus far, little research has been done on the statistical learning of multiword chunks. There is a body of research investigating multiword units (e.g., collocations) in the learning of natural languages (for reviews, see Sonbul & Siyanova-Chanturia, 2024; Pellicer-Sánchez, 2019; Kuiper, Columbus, & Schmitt, 2009). In addition to statistical learning per se, these studies involve semantic, syntactic, and pragmatic features of the to-be-learned items; learning success in a second language is additionally influenced by vocabulary knowledge, congruency, and semantic transparency.

In adult language, multiword chunks are constrained in time: The longer a chunk, the more likely its termination. Cognitive and neural factors limit their duration to ∼2–3 sec (Pöppel, 1997, 2009; Baddeley, Thomson, & Buchanan, 1975). For example, in language production, median utterance duration is ∼2.6 sec (Vollrath, Kazenwadel, & Krüger, 1992). During language comprehension, evoked components in the EEG related to chunking (closure positive shift, CPS; Bögels, Schriefers, Vonk, & Chwilla, 2011a, 2011b; Steinhauer, Alter, & Friederici, 1999) occur every 2.7 sec (Roll, Lindgren, Alter, & Horne, 2012). Likewise, processing of grammatical agreement takes longer when distances between dependent words exceed 2.7 sec (Schremm, Horne, & Roll, 2016; Experiment 1) and shows altered neural responses above or below this distance (Roll, Gosselke, Lindgren, & Horne, 2013). Behaviorally, listeners indicate chunk boundaries during naturalistic listening at an average interval of 2.55 sec (SD = 1.2 sec; Vetchinnikova, Konina, Williams, Mikušová, & Mauranen, 2023). Given the apparent regularity of chunk durations, it has been proposed that a periodic electrophysiological substrate constrains the duration of chunks (Henke & Meyer, 2021). In particular, cycles of delta-band oscillations (< 4 Hz) span over several seconds and may in principle be long enough to serve multiword chunking. On the basis of the suggested electrophysiological nature of the constraint, it would be expected to hold across languages; however, this needs to be shown on cross-linguistic data. Ding, Melloni, Zhang, Tian, and Poeppel (2016) showed that neural activity at frequencies within the delta band tracked syntactic structure that could not be inferred from speech acoustics. Using isochronous stimuli consisting of four-syllabic sentences—each composed of two-syllabic phrases, they showed that neural activity tracked the rates of phrases and sentences (i.e., there was a peak at those frequencies in the power spectrum of the magnetoencephalography [MEG]). Likewise, delta-band phase was found to predict the termination of a multiword chunk (Meyer, Henry, Gaston, Schmuck, & Friederici, 2016) and enforce chunking after 2.7 sec (Henke & Meyer, 2021).

If electrophysiological processing time windows indeed constrain the formation of multiword chunks, this might influence not only language comprehension but also the learning of chunks in a new language or a language-like stimulus. That is, learning also requires the integration of individual words into larger chunks. Therefore, we here assess whether the learning of new multiword chunks in an artificial language is affected by these temporal constraints on processing. We tested this via auditory statistical learning of an artificial language, where we manipulated the duration of to-be-learned speech segments. Previous research investigating the effect of presentation rate on auditory statistical learning has reported inconsistent results (Emberson, Conway, & Christiansen, 2011; Conway & Christiansen, 2009). Most of all, either the investigated presentation rates allowed the to-be-learned items to be easily assembled within a suggested processing window of 2–3 sec (i.e., > 3 items/second) or learnable segments could have an irregular duration and might, in many cases, exceed the processing window. As a result, these previous studies do not allow to draw conclusions on the extent to which statistical learning is influenced by temporal processing windows. On the basis of the previously proposed constraint on language comprehension, we tested to-be-learned segments of durations shorter than 2 sec, longer than 3 sec, or around 2.7 sec (here taken as constraint duration based on prior electrophysiological work; Henke & Meyer, 2021; Roll et al., 2012).

We hypothesized that chunk learning should be better when durations fall within a neural processing window (i.e., 2–3 sec) as compared to when they exceed this duration. Although there is likely also a lower bound, our predictions were less strong for shorter durations. Specifically, our shortest condition falls just below this processing window, and processing may remain unimpaired as long as chunks fall within the delta-band range (Lo, 2021). Yet, Roll and colleagues (2012) did not find chunking-related neural responses for chunks of 1.8 sec during language comprehension, suggesting that chunk learning may also be impaired when the chunk duration falls below the processing window. We first conducted a behavioral experiment to assess learning through an implicit target detection task (Pinto, Prior, & Zion Golumbic, 2022; Lukics & Lukács, 2021; Batterink, Reber, Neville, & Paller, 2015; Franco, Eberlen, Destrebecqz, Cleeremans, & Bertels, 2015) and an explicit chunk recognition task (Pinto et al., 2022; Henin et al., 2021; Batterink & Paller, 2017; Batterink, Reber, Neville, et al., 2015). In a second experiment, we recorded participants' EEG during the exposure phase. Isochrony of the artificial language allowed us to use frequency tagging as a neural index of statistical learning (e.g., Pinto et al., 2022; Henin et al., 2021; Batterink & Paller, 2017). When participants learn the chunk structure by extracting co-occurrence probabilities, neural tracking at the chunk frequency should emerge. Overall, our results will contribute to understanding temporal limits on language processing, possibly reflecting an oscillatory period that serves as a pacemaker for (artificial) language learning.

Methods

Participants

Seventy-two native speakers of German participated in the behavioral online study (49 female and 23 male; mean age = 24.9 years, SD = 4.8 years). Participants were recruited on Prolific (www.Prolific.co) and tested on the online platform Gorilla (Anwyl-Irvine, Massonnié, Flitton, Kirkham, & Evershed, 2020). Based on the experimental design, full counterbalancing across condition was achieved after 36 participants. To increase statistical power despite more expected noise because of the online measurement (e.g., Anwyl-Irvine, Dalmaijer, Hodges, & Evershed, 2021), we decided to sample twice as many participants. A further 14 participants were excluded because of missing data for an entire subblock (n = 7), low hit rate in the target detection task (< 80%, n = 5; see Results section below), or missing data in a subblock after outlier removal (n = 2). Excluded participants were replaced to maintain the counterbalanced design (see Materials section). Participants self-reported right-handedness. The experiment was approved by the local ethics committee, and informed consent was obtained before participation.

Materials

We employed an auditory statistical learning paradigm that manipulates the duration of to-be-learned chunks. We created three artificial languages. Each language consisted of four multiword chunks, in turn composed of three pseudowords each (see Table 1). We used disyllabic pseudowords that were composed of German consonant–vowel (CV) syllables. Each CV pair occurred only once across all artificial languages. Pseudowords were chosen not to resemble real words. Individual pseudowords were synthesized using WaveNet voices (van den Oord et al., 2016) provided by the Google Cloud Text-to-Speech API. After synthesis, the PSOLA algorithm (Moulines & Charpentier, 1990) implemented in Praat (Boersma, 2001) was used to adjust all words to a uniform duration of 550 msec. To enhance naturalness, this value was chosen based on the mean duration of the raw synthesized pseudowords (M = 536 msec, SD = 40 msec). Pseudowords were then combined into three-word chunks.

Table 1.

Multiword Chunks for All Three Languages

Language 1Language 2Language 3
KEGI PASI NAMO  LEBU REKA NULO  SETU PUFO BIKO 
MIHA WURO FUNE  GOWA LAKU KIDE  JOFI DUPE LUWO 
RIBA TEDO NOWE  BOTA HEGA TIHU  HIGE FEPO MASU 
WIME MUPI RAJU  JALI GUSA SODI  DANI TORU BEFA 
Language 1Language 2Language 3
KEGI PASI NAMO  LEBU REKA NULO  SETU PUFO BIKO 
MIHA WURO FUNE  GOWA LAKU KIDE  JOFI DUPE LUWO 
RIBA TEDO NOWE  BOTA HEGA TIHU  HIGE FEPO MASU 
WIME MUPI RAJU  JALI GUSA SODI  DANI TORU BEFA 

We manipulated the duration of the to-be-learned chunks by inserting a silent pause interval of 100, 300, or 500 msec between the pseudowords; pause durations were chosen to be equally spaced between conditions and to allow chunk durations to range around the proposed time constraint of 2–3 sec. In particular, we wanted to choose a medium duration that would be within the suggested processing window (i.e., between 2.55 and 2.7 sec). This resulted in chunk durations of 1.95, 2.55, and 3.15 sec (see Figure 1B). Subsequently, chunks were combined into streams of the artificial language. Critically, chunk boundaries were not marked acoustically, but probabilistically: TPs were high between pseudowords within a chunk (TP = 1; i.e., pseudowords always occur together within a chunk), whereas TPs are low across chunk boundaries (TP = 0.33; i.e., the last pseudoword of a chunk is equiprobably followed by various other words; Figure 1A). The same chunk never occurred twice in a row. To avoid potential cueing of chunks at the beginning and end of the audio, the volume was ramped for the duration of one pseudoword including the subsequent pause (i.e., 650, 850, and 1050 msec) on both ends. This ramping effectively makes the first and last word of the stream less audible and therefore avoids hinting participants at the chunk boundaries at the beginning and end of the stream (cf. Henin et al., 2021; Ordin, Polyanskaya, Soto, & Molinaro, 2020).

Figure 1.

Visualization of the materials and experimental manipulation. (A) TPs within and across chunks; words belonging to a chunk are color-coded. (B) Duration manipulation of the auditory materials. (C) Time course of the experimental procedure highlighting different tasks across the online and EEG experiment. (D) Illustration of the target detection task with the target word TEDO. (E) Example of the chunk recognition task.

Figure 1.

Visualization of the materials and experimental manipulation. (A) TPs within and across chunks; words belonging to a chunk are color-coded. (B) Duration manipulation of the auditory materials. (C) Time course of the experimental procedure highlighting different tasks across the online and EEG experiment. (D) Illustration of the target detection task with the target word TEDO. (E) Example of the chunk recognition task.

Close modal

Procedure

During the exposure phase, participants listened to an isochronous sequence of disyllabic pseudowords that composed an artificial language. Hidden within the sequence were three-word chunks. In a block design, we manipulated the duration of the to-be-learned chunks. Each duration was presented in four consecutive subblocks to test changes in learning with exposure, resulting in 12 subblocks across chunk durations. Chunks occurred 12 times per subblock. Subblock duration differed across chunk durations (93.6, 122.4, or 151.2 sec). The order of chunk durations was counterbalanced across participants. In addition, each artificial language was presented in each chunk duration across participants, leading to a fully counterbalanced design after 36 participants. Between subblocks, participants took a pause of at least 1 min. Participants were allowed longer breaks between duration blocks.

During the initial exposure phase, we tested learning by an implicit target detection task (Lukics & Lukács, 2021). Participants were provided with a target word before each subblock and had to press a button as quickly as possible whenever they heard this word (Figure 1D). Within each chunk duration, participants would have to detect 48 target words across the four subblocks. The target word was always the second word of a chunk. In that way, detection should accelerate when participants learn the co-occurrence between the first and second word of a chunk, which makes the target more predictable. Targets varied across the four subblocks and came from a different chunk each time. The order of target words for each language was kept constant across durations to avoid potential differences. The target word never occurred within the first chunk of a stream because of the acoustic ramping. Note that, opposed to Lukics and Lukács (2021), we did not include a fully random condition as we were merely interested in differences between chunk durations.

After exposure, participants additionally completed an explicit chunk recognition task (Pinto et al., 2022; Henin et al., 2021; Batterink & Paller, 2017; Batterink, Reber, Neville, et al., 2015) and rated the certainty of their answers (Ordin et al., 2020). Participants would hear two words of the language that could either compose a chunk (i.e., have high TPs such as RIBA TEDO), a part-chunk crossing over a boundary with low TP (e.g., NOWE KEGI), or a nonchunk that never occurred in this order in the audio stream and thus violates the learned regularities (e.g., RIBA NAMO; the first word of one chunk followed by the third word of another chunk) and had to indicate if it was part of the previously heard language (Figure 1E). We chose to use word pairs to double the number of unique trials without repetitions. Across durations, word pairs were presented with a pause of 300 msec (i.e., the medium duration). Participants completed 16 trials (eight chunks, four part-chunks, and four nonchunks; balancing out the rate of correct and incorrect items as only chunks were considered part of the language) and had 5 sec to answer via button press. After having responded, participants were asked to rate the certainty of their answer on a 4-point Likert scale from guessed to remembered. Afterward, participants received feedback on their answers; that is, “correct” or “incorrect” would appear on their screen. An overview of the experimental procedure is displayed in Figure 1C.

At the end of the online experiment, participants were additionally asked about their explicit knowledge of the multiword chunks. To that end, participants first received a debriefing about the fact that three words always occurred together. Then, they were presented with the words of each language and were asked to intuitively group the words into chunks.

Data Acquisition and Analysis

Data for the behavioral study were acquired on Gorilla (Anwyl-Irvine et al., 2020) via keys on participants' computer keyboards. Behavioral responses were analyzed in R (R Core Team, 2021).

The overall accuracy of the target detection task served to evaluate attention. For the target detection task, we additionally analyzed the RTs of accurate button presses to target words. A button press was considered correct when it occurred within 1.7 sec from the stimulus onset. This duration was chosen to cover the target word and the next word within the medium duration condition. We used the Mahalanobis distance based on duration and subblock to remove outliers in RTs (4% of data removed). Participants who ended up with empty subblocks without any data points (i.e., no remaining response to a target) were replaced to achieve a counterbalanced design (n = 2); the procedure was repeated for the full sample. For statistical analysis, we applied a linear mixed-effects model on RTs (in seconds) with the factors Duration and Target Number as well as their interaction term. As random factors, we included intercepts and slopes for duration within participants and random intercepts for subblock (to account for different target items across subblocks), resulting in the maximal converging model without singularity issues. We kept the target number as numeric predictor (from 1 to 48; cf. Schneider, Weng, Hu, & Qi, 2022), while we dummy-contrast-coded the factor Duration with the medium duration as the reference level.

For the explicit chunk recognition task, we first tested accuracy for each level across pair types against chance (50% accuracy) by means of a Wilcoxon signed-rank test. Performance above chance may be taken as evidence that participants learned the underlying structure. To further investigate condition differences, we analyzed responses by a generalized linear mixed-effects model with fixed effects for Duration and Pair Type with interaction term and random intercepts for participants. For the certainty ratings, we subset only the correct answers and applied a proportional odds logistic regression with the factors Duration and Pair type with interaction term. Specifically, the proportional odds logistic regression allows to model an ordinal response with ordered levels, such as those from the here used Likert scale. Statistical comparison of all models was performed by a likelihood ratio test (LRT) of the model without the effect in question against the model including this effect. As scoring of the explicit reproduction of chunks revealed that the task difficulty was too high, we decided not to analyze this task any further. An overview of the statistical models across experiments can be found in Table 2.

Table 2.

Overview of Statistical Models across Experiments

Statistical Model
Online experiment 
 Target detection task lmer(RT ∼ Duration × Target Number + (Duration|participant) + (1|subblock)) 
 Chunk recognition task glmer(response ∼ Duration × Pair Type + (1|participant)) 
 Certainty ratings polr(certainty ∼ Duration × Pair Type) 
 
EEG experiment 
 Chunk recognition task glm(response ∼ Duration × Pair Type) 
 Familiarity ratings polr(familiarity ∼ Duration × Pair Type) 
Statistical Model
Online experiment 
 Target detection task lmer(RT ∼ Duration × Target Number + (Duration|participant) + (1|subblock)) 
 Chunk recognition task glmer(response ∼ Duration × Pair Type + (1|participant)) 
 Certainty ratings polr(certainty ∼ Duration × Pair Type) 
 
EEG experiment 
 Chunk recognition task glm(response ∼ Duration × Pair Type) 
 Familiarity ratings polr(familiarity ∼ Duration × Pair Type) 

Results

After exclusion of participants with an accuracy < 80%, only participants who paid attention to the audio stream remained in the analysis, as indicated by the high hit rate in the target detection task (mean accuracy = 96%, SD = 4%).

In the RTs of the target detection task, we found a main effect of Duration, χ2(2) = 11.17, p = .004, a main effect of Target Number, χ2(1) = 16.08, p < .001, and an interaction of Duration and Target Number, χ2(2) = 6.26, p = .044 (see Table 3 for the full regression model). The interaction revealed a significant difference between the medium and long duration with a shallower slope for the long duration (i.e., less acceleration in RTs; difference estimate = 0.0006, SE = 0.0003, z = 2.41, p = .016; Figure 2B). There was no statistically significant difference in RTs between the medium and short duration (difference estimate = 0.0005, SE = 0.0003, z = 1.79, p = .073).

Table 3.

Regression Table of Full Model of the Target Detection Task

PredictorsRTs
EstimatesConfidence Intervalt Value
(Intercept) 0.68 0.63, 0.73 28.30 
Duration [short] −0.04 −0.06, −0.02 −3.42 
Duration [long] −0.02 −0.04, 0.01 −1.30 
Target Number −0.002 −0.003, −0.001 −4.92 
Duration [short] × Target Number 0.0005 −0.0, 0.001 1.79 
Duration [long] × Target Number 0.0006 0.0001, 0.0011 2.41 
Marginal R2/conditional R2 .022/.408 
PredictorsRTs
EstimatesConfidence Intervalt Value
(Intercept) 0.68 0.63, 0.73 28.30 
Duration [short] −0.04 −0.06, −0.02 −3.42 
Duration [long] −0.02 −0.04, 0.01 −1.30 
Target Number −0.002 −0.003, −0.001 −4.92 
Duration [short] × Target Number 0.0005 −0.0, 0.001 1.79 
Duration [long] × Target Number 0.0006 0.0001, 0.0011 2.41 
Marginal R2/conditional R2 .022/.408 
Figure 2.

Results of the target detection task. (A) Mean RTs across target numbers and chunk durations. (B) Group-level regression of the full model including the interaction, displaying the predicted RTs in the target detection task across target numbers; shaded areas represent the 95% confidence intervals.

Figure 2.

Results of the target detection task. (A) Mean RTs across target numbers and chunk durations. (B) Group-level regression of the full model including the interaction, displaying the predicted RTs in the target detection task across target numbers; shaded areas represent the 95% confidence intervals.

Close modal

For analysis of the chunk recognition task including the certainty ratings, one additional participant was removed because of missing response data for one entire duration (i.e., all responses were recorded as timeout). Overall accuracy in the chunk recognition task was above chance (50%) for all durations (Wilcoxon test; all zs > 5.23, all ps < .001; see Table 4), suggesting that participants have learned the co-occurrence of pseudowords. For individual responses, logistic regression revealed a main effect of Pair Type (i.e., chunk, part-chunk, or nonchunk), χ2(2) = 176.49, p < .001, but not of Duration, nor any interaction between the two (both ps > .05). Post hoc pairwise comparison based on estimated marginal means with Tukey correction showed significant differences between all pair types (Figure 3A, left) with chunks showing higher accuracy than part-chunks (difference estimate = 1.14, SE = 0.09, z = 13, p < .001) and nonchunks (difference estimate = 0.54, SE = 0.09, z = 6.16, p < .001), and part-chunks were recognized less accurately than nonchunks (difference estimate = −0.6, SE = 0.1, z = −6.07, p < .001). The model further improved when adding the ordered factor Certainty, χ2(3) = 75.23, p < .001. Post hoc comparison showed that this was guided by increased accuracy for the highest certainty rating (i.e., remembered) compared to all other ratings (all zs > 5.78, all ps < .001, Tukey-corrected), indicating that participants' response accuracy increased when they remembered the word pair. This suggests that the certainty ratings indeed relate to performance and participants may have acquired some conscious knowledge about the chunks. In the next step, we wanted to see whether certainty ratings on correct answers may be predicted by duration and pair type. The ordinal regression model showed a significant main effect of Pair Type (LRT = 120.52, p < .001) and Duration (LRT = 8.05, p = .018) without an interaction between factors (p = .66). Post hoc pairwise comparisons based on estimated marginal means with Tukey correction showed significant differences between all pair types (participants are more certain about chunks than part-chunks and nonchunks, both zs > 4.75 and ps < .001, and about nonchunks as compared to part-chunks, z = 5.68 and p < .001; Figure 3B, left) and indicated higher certainty of participants' answer in the short compared to the medium duration (difference estimate = 0.28, SE = 0.1, z = 2.83, p = .013). There was no significant difference between the short and long duration as well as the medium and long duration (both ps > .05).

Table 4.

Overview of the Accuracy in the Explicit Chunk Recognition Task

 Online Experiment (n = 71)EEG Experiment (n = 36)
Short 62% (14) 54% (12) 
Medium 60% (13) 55% (12) 
Long 60% (13) 51% (9) 
 Online Experiment (n = 71)EEG Experiment (n = 36)
Short 62% (14) 54% (12) 
Medium 60% (13) 55% (12) 
Long 60% (13) 51% (9) 

Mean (SD) accuracy in percent.

Figure 3.

Behavioral results of (A) the chunk recognition task and (B) the certainty ratings for the online experiment and the familiarity ratings for the EEG experiment; error bars represent the 95% confidence interval. ***p < .001, *p < .05, Tukey-corrected. For the recognition task, only chunks were considered as part of the language.

Figure 3.

Behavioral results of (A) the chunk recognition task and (B) the certainty ratings for the online experiment and the familiarity ratings for the EEG experiment; error bars represent the 95% confidence interval. ***p < .001, *p < .05, Tukey-corrected. For the recognition task, only chunks were considered as part of the language.

Close modal

Intermediate Discussion

In the behavioral target detection task, participants showed stronger acceleration in RT speed with exposure to chunk durations of 2.55 sec as compared to longer chunks. This decrease in RTs is in line with the idea that participants respond faster when they learn the co-occurrence of words (Lukics & Lukács, 2021). Changes in RTs of the medium duration did, however, not differ from shorter chunk durations. We cautiously interpret these results as the best learning for chunks around the proposed time constraint of 2.7 sec relative to chunks exceeding 3 sec. However, we cannot draw strong conclusions on the lower end. Please note that because we were merely interested in how RTs change across the learning phase, our interpretations are not influenced by the fact that participants generally tended to respond quicker to faster stimulation rates. In addition, the raw data suggest an overall increase in RTs across the experiment. This may be guided by the fact that RTs slowed down in the last block across all conditions (see Figure 2A).

Learning differences between chunk durations were only revealed during the online target detection task, but not in the offline chunk recognition task. That is, we did not observe any differences between chunk durations in the recognition task. Yet, overall performance above chance in all three durations suggests that participants learned all durations equally well after the full exposure. Note that this may have been fostered by the feedback that participants received on their responses. In addition, participants were generally more certain about their answers to short as compared to medium durations independent of pair type.

These diverging results may relate to previously reported differences in measures of statistical learning focusing more on implicit knowledge during the learning process and explicit knowledge for the learning outcome (Batterink, Reber, Neville, et al., 2015; Batterink, Reber, & Paller, 2015; for an extended consideration, see General Discussion section). In line with our results, the target detection task was found to be more sensitive to learning than the chunk recognition task (Batterink, Reber, Neville, et al., 2015).

Overall, the results of the different behavioral tasks do not allow for strong conclusions. Yet, the target detection task hints toward best learning of chunk durations that are aligned to the suggested processing windows around 2.7 sec (i.e., the medium duration) as compared to longer durations. Hence, although there seems to be an upper limit to the duration of new multiword chunks, the results are inconclusive about shorter durations. To draw further conclusions, we therefore chose to conduct a second experiment using a neural marker of learning as a more sensitive method. To that end, we recorded participants' EEG during the exposure phase and used neural frequency tagging as an index of learning (e.g., Pinto et al., 2022; Henin et al., 2021; Batterink & Paller, 2017). Here, neural tracking at the chunk frequency should emerge when participants learn the chunk structure by extracting TPs. Following the behavioral results, we predicted higher neural tracking (i.e., better learning) for the medium as compared to the long duration. For the short duration, tracking may not differ from the medium duration, confirming the behavioral results, or we may observe reduced tracking in line with a lower bound of the processing window given the higher sensitivity of the neural response.

Methods

Participants

For the EEG experiment, we recruited 36 participants (18 female and 18 male; mean age = 26 years, SD = 4 years). Three additional participants were excluded from the analysis because of a technical issue during the recording (n = 1) or low accuracy in a word recognition task (n = 2; see Results section below), casting doubt on their attention. Participants were right-handed (Oldfield, 1971; mean lateralization quotient = 94, SD = 9.3) and reported no history of neurological, hearing, or language disorder. The local ethics committee approved the experiment, and participants provided informed consent before their participation.

Methods

Materials were entirely the same as in Experiment 1. The procedure also largely remained the same starting with an exposure phase, followed by an explicit chunk recognition task and familiarity ratings for each duration with overall only small modifications to adapt to the EEG measurement (see Figure 1).

In the EEG experiment, we recorded participants' neural activity during the exposure phase. We could not apply the target detection task because of muscle artifacts that would accompany the motor response. To still monitor participants' attention, they performed a word recognition task after each subblock. Participants were auditorily presented with six pseudowords (three words of the language and three words that were not part of any artificial language); after each word, they had 3 sec to indicate via a button press whether the word was part of the language. Because of a technical issue, one participant had breaks shorter than the intended 1 min in between subblocks.

The chunk recognition task was entirely the same as in Experiment 1. However, we changed the certainty ratings to familiarity ratings (Batterink & Paller, 2017; Batterink, Reber, & Paller, 2015). That is, after each trial in the chunk recognition task, participants were asked to rate their familiarity with the word pairs. Answers were given on a 3-point Likert scale (i.e., unknown/guessed/known) given the configuration of the button box. Finally, feedback on the chunk recognition task and the reproduction task were dropped for the EEG experiment.

Data Acquisition

For the EEG experiment, continuous data were recorded from 63 Ag/AgCl electrodes mounted in an elastic cap (ANT Neuro GmbH) sampled at 500 Hz. Vertical and horizontal eye movements were monitored by bipolar electrodes placed around the eyes. Data were referenced online to the left mastoid (A1), and an additional electrode on the sternum served as ground. Behavioral responses were recorded via a button box.

Analysis

In the EEG experiment, the overall accuracy of the word recognition task served to evaluate attention. Analysis of the behavioral data from the explicit chunk recognition task and the familiarity ratings were the same as for the online experiment; given the nature of the familiarity ratings, both correct and incorrect recognition trials were included. That is, we analyzed the familiarity ratings independently of whether a word pair was previously correctly recognized/rejected to keep the variance of responses and informativity of the task (i.e., familiar pairs were likely also indicated as part of the language, whereas unfamiliar pairs were likely rejected). Yet, for the chunk recognition task, we only computed a linear model without any random effect term as random intercepts for participants lead to singularity issues (see Table 2).

For preprocessing and statistical analysis of the EEG data, we combined EEGLAB functions (Delorme & Makeig, 2004), functions from Fieldtrip (Oostenveld, Fries, Maris, & Schoffelen, 2011), and custom MATLAB code (The MathWorks, Inc.).

Continuous EEG data were rereferenced offline to the linked mastoids, high-pass filtered at 0.1 Hz with a fourth-order two-pass Butterworth infinite impulse response filter, and down-sampled to 250 Hz. Bad channels were removed when the normed joint probability of the average log power surpassed 3 SDs (mean number of removed channels = 2.8, SD = 1.9). Large artifacts were removed based on a wavelet-enhanced independent component analysis (ICA) (Gabard-Durnam, Mendez Leal, Wilkinson, & Levin, 2018; Castellanos & Makarov, 2006). Then, data were high-pass filtered with a 1-Hz two-pass sixth-order Butterworth infinite impulse response filter, which is recommended for improving independent component analysis (Winkler, Debener, Muller, & Tangermann, 2015; Makeig, Bell, Jung, & Sejnowski, 1996). We used MARA (Winkler, Haufe, & Tangermann, 2011) to identify artifactual components. Actual removal of components with an artifact probability > .5 was performed on the 0.1-Hz-filtered data (mean number of rejected components = 28.6, SD = 5.8). Finally, data were rereferenced to the common average of all (remaining) scalp electrodes, and subsequently, bad channels were interpolated using spherical interpolation.

We hypothesized that there would be increased neural synchronization for the frequencies of words—as those are acoustically marked—and chunks—after those have been learned. We first computed the power spectral density of approximately the last ∼89 sec from each exposure block (excluding the last chunk to account for the audio ramping). We varied the exact duration according to the condition to keep data length roughly equal while analyzing a full number of chunks (88–90 sec; 46 chunks for the short, 35 for the medium, and 28 for the long duration; Ding, 2023). Frequency analysis was computed across subblocks from 0.2 to 1.8 Hz using fast Fourier transform with Hanning tapers and added zero padding to a length of the next power of 2. Statistical comparison was performed based on a cluster-based permutation test against neighboring frequencies (Maris & Oostenveld, 2007; dependent-samples one-sided t tests, 10,000 permutations, ≥3 channels minimum cluster size). For each frequency within each duration, we created a baseline based on the average of the two frequency bins around the target frequency (Ding et al., 2017). Note that this procedure removes the outermost frequencies (i.e., 0.2 and 1.8 Hz) from the analysis. Yet, this should not influence our results given the wide range of analyzed frequencies and the sufficient distance to the frequencies of interest.

We hypothesized that the 1/f trend of the power spectral density might cover synchronicity at the low frequencies of the chunks (i.e., 0.32, 0.39, and 0.51 Hz). Yet, methods to separate aperiodic/fractal components from the power spectrum were previously only tested for frequencies > 1 Hz (e.g., FOOOF, Donoghue et al., 2020, or IRASA, Wen & Liu, 2016) and may not be appropriate below this frequency (i.e., methods cannot be applied for our frequencies of interest; Gerster et al., 2022). Hence, in addition to analyzing power, we computed the intertrial phase coherence (ITPC; Pinto et al., 2022; Ordin et al., 2020; Batterink & Paller, 2017; Ding et al., 2017) across chunks. First, the wavelet transform of the continuous signal with added zero padding was computed using Morlet wavelets at frequencies from 0.2 to 1.8 Hz in 0.01-Hz steps, allowing for a sufficiently high frequency resolution to examine the frequencies of interest (i.e., 0.32, 0.39, and 0.51 Hz for chunks and 0.95, 1.18, and 1.54 Hz for words; Figure 4A). Then, the decomposed time–frequency data were segmented into chunks (i.e., different epoch lengths for each duration). The first and last chunks of each block were removed to account for the audio ramping. Over the remaining epochs, we computed the ITPC and then averaged over the entire chunk duration. Statistical comparison was performed analogous to the power spectral density against a baseline distribution based on the average of the two frequency bins around the target frequency (i.e., ±0.01 Hz).

Figure 4.

Results of the power analysis. (A) Correspondence of chunk durations to their occurrence frequencies. (B) Power spectrum of the EEG during the exposure phase averaged across all electrodes.

Figure 4.

Results of the power analysis. (A) Correspondence of chunk durations to their occurrence frequencies. (B) Power spectrum of the EEG during the exposure phase averaged across all electrodes.

Close modal

Finally, we compared ITPC across chunk durations as we hypothesized that neural synchronization would be highest for chunks that fall within an electrophysiological processing window. Statistical comparison focused a priori on the frequencies of interest that correspond to the different chunk durations (i.e., 0.32, 0.39, and 0.51 Hz). For each duration across participants and electrodes, we extracted the ITPC for the target frequency corresponding to the chunk duration. We then fitted a linear model for the ITPC at each electrode with duration as fixed effect and random intercepts for participants. Post hoc pairwise testing was performed at the electrode that showed a statistically significant effect of duration after Bonferroni correction to account for multiple comparisons across electrodes.

Results

Behavioral Results

Two participants were removed from analysis because of low accuracy (i.e., < 80%) in the word recognition task. The remaining participants showed high accuracy (mean accuracy = 93%, SD = 4%), indicating that they paid attention to the audio stream during the learning phase.

In the EEG experiment, accuracy in the chunk recognition task (see Table 4) was significantly above chance for the medium duration (one-sample t test; t(35) = 2.51, p = .017), but not for the short, t(35) = 1.79, p = .082, or long, t(35) = 0.71, p = .48, duration. Consistent with the behavioral experiment, we only found a significant main effect of Pair Type on the accuracy in the chunk recognition task (LRT = 216.34, p < .001), with significant differences between all comparisons (all ps < .001, Tukey-corrected; Figure 3A, right): Participants showed higher accuracy for chunks as compared to part-chunks (difference estimate = 1.69, SE = 0.13, z = 13.14) as well as nonchunks (difference estimate = 1.17, SE = 0.12, z = 9.49), and nonchunks were recognized more accurately than part-chunks (difference estimate = 0.53, SE = 0.14, z = 3.68). No effect of Duration (p = .38) nor an interaction of Pair Type and Duration (p = .55) was observed. Adding Familiarity did not improve the model (p = .086). Likewise, we only found a significant main effect of Pair Type on familiarity (LRT = 20.10, p < .001) guided by higher familiarity to chunks compared to nonchunks (difference estimate = 0.5, SE = 0.11, z = 4.49, p < .001, Tukey-corrected; Figure 3B, right) as well as to part-chunks compared to nonchunks (difference estimate = 0.33, SE = 0.13, z = 2.56, p = .028).

EEG Results

Statistical analysis of the power spectral density showed a positive cluster at the word frequency for the short (cluster sum t(35) = 448.98, cluster-level p < .001; ranging from 1.53 to 1.54 Hz), medium (cluster sum t(35) = 484.81, cluster-level p < .001; ranging from 1.17 to 1.18 Hz), and long (cluster sum t(35) = 469.25, cluster-level p < .001; ranging from 0.946 to 0.954 Hz) durations over all electrodes, indicating that those frequency bins showed significantly stronger power than their neighbors (Figure 4B) and, thus, tracking of the acoustically marked words.

Analysis of the ITPC revealed increased neural responses to frequencies of words and chunks as well as their harmonics across all chunk durations (Figure 5): For the short duration, we observed eight significant clusters covering frequencies from 0.46–0.54, 0.95–0.99, 1.01–1.03, 1.38–1.65, and 1.67–1.68 Hz (all cluster sum t(35) > 18.66, cluster-level p < .038). For the medium duration, we observed five significant clusters covering frequencies from 0.35–0.41, 0.71–0.82, 1.03–1.3, and 1.32 Hz (all cluster sum t(35) > 103.75, cluster-level p < .025). Furthermore, for the long duration, we observed four significant clusters covering frequencies from 0.29–0.34, 0.60–0.64, 0.83–1.05, and 1.75–1.79 Hz (all cluster sum t(35) > 125.73, cluster-level p < .029).

Figure 5.

Results of the ITPC. (A) ITPC of chunks across frequencies at electrode Cz; bold lines indicate the group mean; and fine lines, individual's ITPC. (B) Topographies of ITPC at target chunk frequencies corresponding to each chunk duration. (C) ITPC at target chunk frequencies at electrode Cz; ***p < .001, Tukey-corrected. (D) Topography of log-transformed F values of the effect of duration on ITPC displayed in C, highlighting the statistically significant electrode Cz.

Figure 5.

Results of the ITPC. (A) ITPC of chunks across frequencies at electrode Cz; bold lines indicate the group mean; and fine lines, individual's ITPC. (B) Topographies of ITPC at target chunk frequencies corresponding to each chunk duration. (C) ITPC at target chunk frequencies at electrode Cz; ***p < .001, Tukey-corrected. (D) Topography of log-transformed F values of the effect of duration on ITPC displayed in C, highlighting the statistically significant electrode Cz.

Close modal

Planned comparisons of target frequencies across durations revealed a statistically significant effect of Duration at electrode Cz, F(2) = 11.44, p = .003 (Bonferroni-corrected). Note that the random factor for participants led to singular fits at some electrodes, indicating that its inclusion did not explain much variance in the data. Because inclusions/exclusions did not change the results, we nonetheless kept the factor in the model for all electrodes. Post hoc pairwise comparisons with Tukey correction showed increased ITPC for short as compared to long (difference estimate = 0.06, SE = 0.01, t = 4.17, p < .001) as well as for medium as compared to long (difference estimate = 0.05, SE = 0.01, t = 3.99, p < .001) durations, but no difference between short and medium durations (p = .98).

We investigated temporal constraints on the learning of multiword chunks through an auditory statistical learning paradigm. Results from a behavioral target detection task and frequency-domain EEG analysis suggest better learning for chunks shorter than 2.7 sec relative to chunks exceeding 3 sec. In the behavioral target detection task, participants showed the strongest decrease in RTs with exposure to chunk durations of 2.55 sec. This acceleration suggests that participants learned co-occurrence probabilities best in the medium duration, which closest matches the proposed processing window; this did, however, not differ from shorter chunk durations, leaving open whether and how the learning of shorter chunks is affected. The behavioral results were extended by the ITPC analysis on the EEG: During exposure, words and multiword chunks were tracked across durations, yet tracking declined for chunk durations beyond 2.55 sec. These results surfaced centrally on the scalp (at electrode Cz), where previous studies using EEG have also reported frequency tagging (Lu, Jin, Pan, & Ding, 2022; Ding et al., 2018; Jin, Zou, Zhou, & Ding, 2018). Overall, our results support the idea of an endogenous time constraint on (artificial) language learning: Participants learned co-occurrence probabilities between words best when their duration fell within processing windows for linguistic chunking of 2–3 sec. In this way, the temporal integration of words into larger multiword chunks may only work up to a certain temporal limit, possibly reflecting electrophysiological processing windows.

Our findings are consistent with the idea that temporal processing windows in the brain serve as an active means of information integration for efficient comprehension of word sequences (e.g., Lerner, Honey, Silbert, & Hasson, 2011; Hasson, Yang, Vallines, Heeger, & Rubin, 2008). In language research, it was suggested that chunks contain a maximum of six words (Frazier & Fodor, 1978), which in the time domain roughly corresponds to the median utterance duration in spontaneous speech of approximately 2.6 sec (Vollrath et al., 1992). Intriguingly, this is consistent with the mean duration of intuitively marked chunk boundaries of 2.55 sec during naturalistic speech comprehension (Vetchinnikova et al., 2023). Likewise, the chunking-related CPS was observed periodically at clause boundaries when clauses lasted 2.7 sec, but not 4 sec, possibly relating to an upper bound of the processing window (Roll et al., 2012). The current results cannot draw a definite conclusion on whether we pit into an evoked or oscillatory neural substrate, which we discuss in more detail below. Nevertheless, our findings are, in principle, consistent with the idea that low-frequency neural oscillations in the delta band may serve as mechanism for chunk-level integration, acting as a time constraint. Delta-band oscillations were found to track syntactic structure within isochronous stimuli (Lo, Tung, Ke, & Brennan, 2022; Burroughs, Kazanina, & Houghton, 2021; Ding et al., 2016, 2017). Extending these findings, delta-band tracking was also found for paired-word sequences based on an abstract chunking task (Lu et al., 2022; Jin, Lu, & Ding, 2020). In this way, they may be an endogenous pacemaker for cognitive and, particularly, language processing (Meyer, Sun, & Martin, 2020a, 2020b; Meyer et al., 2016). In support of this, delta-band phase was found to enforce the termination of a multiword chunk after 2.7 sec (Henke & Meyer, 2021).

Our EEG results suggest that the ability to learn new chunks is temporally constrained by an upper bound. Yet, they do not provide evidence for a lower bound. Generally, our results did not reveal differences between the short (1.95 sec) and medium (2.55 sec) chunk durations. Alone, the chunk recognition results from the EEG experiment showed performance above chance for the medium duration only, but not the short and long durations—which does hint at a lower bound as well. Taking both the EEG and behavioral results into account, we may only speculate that the duration of the shortest chunks might still be above a lower bound of the processing window. In support of this, Roll and colleagues (2012) only observed a CPS at chunk boundaries at 2.7 sec, but not at 1.8 sec. Notably, even our shortest chunks ranged over longer durations of 1.95 sec. Likewise, prior work found that accuracy in an auditory working memory task decreased when an externally marked chunking rate exceeded the delta-band range (i.e., was faster) compared to when falling within the delta-band range (Ghitza, 2017), which was also accompanied by a reduced neural response in the MEG (Rimmele, Poeppel, & Ghitza, 2021). In like manner, previously reported frequency tagging at the phrasal level (e.g., Ding et al., 2016) vanishes when the frequency of the phrases falls above the delta-band range (Lo, 2021). In this way, it may be possible that the learning of chunks is exacerbated when their duration is too short; however, the present study has not reached this minimal duration.

The current results link the idea of an electrophysiological timing constraint on language processing—as modeled here by artificial language learning—to general notions of temporal limitation of cognition outside language. Across domains, a window of 2–3 sec has been proposed for the integration of sensory information (Pöppel, 1997, 2009; see also Wittmann, 2011), although beyond this limit, events are not perceived as a unit (Fraisse, 1984). For instance, performance on the reproduction of time intervals (Ulbrich, Churan, Fink, & Wittmann, 2007; Kagerer, Wittmann, Szelag, & Steinbüchel, 2002; Szelag, Kowalska, Rymarczyk, & Pöppel, 2002), integration of metronome beats (Szelag, Stanczyk, & Szymaszek, 2022; Szelag, von Steinbüchel, Reiser, Gilles de Langen, & Pöppel, 1996), and sensorimotor synchronization (Mates, Müller, Radil, & Pöppel, 1994) decreased for intervals exceeding 2–3 sec. On the neural level, accurate temporal reproduction was accompanied by a negative shift in the ERP, which was reduced for durations exceeding 2–3 sec (Kononowicz, Sander, & van Rijn, 2015; Elbert, Ulrich, Rockstroh, & Lutzenberger, 1991). Similarly, the neural mismatch response was reduced when the ISI exceeded 3 sec (Wang, Lin, Zhou, Pöppel, & Bao, 2015; however, note that these differences were driven by gender differences), and neural activity for duration maintenance in working memory changed for duration > 3 sec (Chen, Chen, Kuang, & Huang, 2015). Together, this suggests that the processing window may shape information integration in diverse aspects of cognition. Future research across domains could extend our findings by investigating the neural mechanisms that support the integration of information—possibly, cycles of low-frequent neural activity—rather than the response to durations exceeding the processing window. In addition, we must note that previous research has also reported differences in statistical learning across modalities (e.g., Emberson et al., 2011; Conway & Christiansen, 2009). Accordingly, prior work on visual processing during reading has observed shorter processing windows of 0.5–1 sec (Henke, Lewis, & Meyer, 2023; Lo, Anderson, Henke, & Meyer, 2023) than the here proposed windows for linguistic chunking of 2–3 sec in audition. Although these shorter and longer time windows may relate to the multiscale nature of prosody where multiple short intermediate phrases may be integrated into longer prosodic phrases (Stehwien & Meyer, 2022; Inbar, Grossman, & Landau, 2020), chunking windows could, in principle, also differ across modalities. Future research thus needs to investigate the temporal constraint on the learning of statistical regularities across modalities.

At this point, we can only hypothesize about the exact nature of the electrophysiological mechanism that constrains the processing windows. We suggest that the bandwidth of low-frequency oscillatory activity may serve as a temporal limit to processing and, in this way, constrains the duration of learnable units. In particular, they were suggested to be involved in the domain-general perception of simultaneity, linking and integrating different stimuli (Northoff, 2016). In addition, low-frequent oscillations are involved in coordinating neural activity across cortical regions (Buzsáki, Anastassiou, & Koch, 2012). In this way, they might possibly ascribe to relating bottom–up processing of the acoustic signal in auditory cortices to top–down processing in frontal areas (cf. Rimmele et al., 2021; Keitel, Gross, & Kayser, 2018; Park, Ince, Schyns, Thut, & Gross, 2015). This coordination between the involved brain areas may only work within a certain frequency band, where ongoing oscillations can align their phase to the input structure. These limits may also relate to a relationship between the chunking-related delta-band oscillations and a higher frequency band that represents individual (to-be-chunked) words—analogous to phase–amplitude coupled theta- and gamma-band oscillations in working memory (Lisman & Jensen, 2013).

In our data, tracking of chunks was only visible in the ITPC analysis, but not in power. This may be because of the 1/f trend of the power spectral density at the low frequencies of interest (i.e., 0.32, 0.39, and 0.51 Hz), where ITPC is more robust to background fluctuations. In addition, it has been suggested that phase locking is a more reliable indicator of speech tracking than spectral power (Peelle, Gross, & Davis, 2013; Luo & Poeppel, 2007; Ahissar et al., 2001; for a discussion, see Kabdebon, Pena, Buiatti, & Dehaene-Lambertz, 2015). Oppositely, we observed tracking at the frequencies of the individual words in both power and ITPC. This may reflect entrainment (proper; cf. Meyer et al., 2020b), as phase locking of the neural signal is likely to be driven by the acoustic cues present at the word rate. Anyhow, as is always the case for such findings, there is no definite proof to the entrainment interpretation, because the current paradigm was not suited to reveal potential sustained effects.

We certainly acknowledge that, in principle, the frequency-tagging paradigm does not allow us to draw an unequivocal conclusion on whether the temporal constraint actually reflects an oscillatory mechanism or a succession of evoked responses that are elicited at the end of a chunk (Zoefel, Archer-Boyd, & Davis, 2018; Keitel, Quigley, & Ruhnau, 2014; Capilla, Pazo-Alvarez, Darriba, Campo, & Gross, 2011). We note that it has been shown that the amplitude of auditory evoked potentials increases with longer ISIs (e.g., Pereira et al., 2014; Nelson & Lassman, 1968; for a review, see also Crowley & Colrain, 2004). However, these amplitude changes should prominently influence the response to the individual words rather than the chunk boundaries and may affect the power spectral density, but less so the ITPC. Focusing on chunking, it has been hypothesized that a progression of oscillatory cycles at delta-band frequency may underlie the time-domain CPS (Meyer et al., 2016; Sauseng et al., 2007). Indeed, the CPS has been related to the duration-dependent termination of multiword chunks: Generally, longer segment durations increase the likelihood for elicitation of a CPS (Hwang & Steinhauer, 2011), specifically, when the terminated unit is 2.7 sec long (Roll et al., 2012).

In both of the current experiments, performance differences between chunk durations were only revealed during the learning phase (i.e., an implicit target detection task and frequency tagging in the EEG), but not in the offline chunk recognition task. Previous research on statistical learning suggested different cognitive processes for online and offline tasks: Online tasks may measure the dynamic learning process (implicit knowledge), whereas offline tasks may reflect the learning outcome (explicit knowledge; Batterink, Reber, Neville, et al., 2015; Batterink, Reber, & Paller, 2015). This is supported by weak or lacking correlations between online and offline measures (Pinto et al., 2022; Batterink & Paller, 2017; Franco et al., 2015; but see also Moser et al., 2021; Siegelman, Bogaerts, Kronenfeld, & Frost, 2018; Dale, Duran, & Morehead, 2012) and different learning measures more generally (Pinto et al., 2022; Batterink & Paller, 2017; Batterink, Reber, Neville, et al., 2015; Franco et al., 2015; Siegelman & Frost, 2015). Accordingly, participants did not indicate awareness of the structural regularities in the input during debriefing (e.g., Moser et al., 2021; Turk-Browne, Jungé, & Scholl, 2005). Overall, this suggests that participants cannot access the implicitly acquired knowledge to make explicit judgments. In the present study, this could mean that lacking performance differences across chunk durations in the chunk recognition task may only reflect explicit knowledge and thus underestimate the learning outcome (Batterink, Reber, Neville, et al., 2015). On the other hand, participants could have different learning trajectories or require different exposure time for learning the different chunk durations, yet they may have learned all chunks equally well after the full exposure (Siegelman et al., 2018). Critically, explicit offline measures may not properly represent the acquired knowledge as they may tap into cognitive abilities other than the learning outcome as such and show psychometric weaknesses (for a detailed discussion, see Siegelman, Bogaerts, & Frost, 2017). In particular, those tasks are restricted in the number of trials because of possible inferences based on repeated exposure to nonlearned items and thus lack statistical power. Moreover, performance in the chunk recognition task of our online experiment might have been boosted through the explicit feedback on participants' answers, which we removed for the EEG experiment, as well as through increased attention to the exposure stream mediated by the target detection task. For instance, prior work has shown impaired auditory statistical learning under attentional load (Fernandes, Kolinsky, & Ventura, 2010; Toro, Sinnett, & Soto-Faraco, 2005; however, see Batterink & Paller, 2019, for opposite results on neural measures of statistical learning) and during sleep (Batterink & Zhang, 2022), which might have decreased performance in the chunk recognition task of the EEG experiment. However, the above-chance and the trend toward above-chance performance in the medium and short duration, respectively, together with the nonsignificant difference in the long condition, directly reflect the observed frequency-domain EEG results. Thus, they may still indicate learning. However, this interpretation would argue against the above suggestion of a lower bound in the behavioral data.

It was suggested that statistical learning reflects an individual ability (Siegelman, Bogaerts, Christiansen, & Frost, 2017; Siegelman & Frost, 2015). For instance, there are great interindividual performance differences in measures of statistical learning (cf. Pinto et al., 2022) that were correlated with other cognitive abilities, such as sentence processing (Kidd & Arciuli, 2016; Misyak & Christiansen, 2012; Misyak, Christiansen, & Tomblin, 2010). To investigate interindividual differences in terms of an endogenous constraint, future work might employ the current setup with a more fine-grained temporal gradient of chunk durations.

Our experiment used frequency tagging to investigate the learning of multiword chunks. To relate our results to the duration of chunks and avoid potential confounds, both the word and the chunk rate were kept isochronous within each condition. However, natural speech only shows quasi-rhythmicity, and words need to be integrated, although they do not arrive in perfect isochrony (for a detailed discussion, see Ghitza, 2020; Giraud, 2020; Gwilliams, 2020; Haegens, 2020; Klimovich-Gray & Molinaro, 2020; Lewis, 2020; Meyer et al., 2020a, 2020b). To increase ecological validity, future research may investigate how a nonisochronous word rate influences the learning of multiword chunks in relation to the time constraint. We would predict that the isochrony of the individual words should not influence chunk learning as long as the chunk durations fall within a time window of 2–3 sec.

In the present paradigm, TPs were high between successive (i.e., adjacent) words within a chunk. Yet, naturalistic language also requires the learning of nonadjacent relationships (such as in syntax, morphosyntax, or number agreement). Previous work on statistical learning has shown that infants and adults can learn nonadjacent dependencies within trisyllabic words (e.g., Kabdebon et al., 2015; Buiatti, Peña, & Dehaene-Lambertz, 2009). In natural language, identification of grammatical disagreement takes longer for dependency distances above 2.7 sec as compared to shorter distances (Schremm et al., 2016; Experiment 1). Likewise, work on number agreement has observed distinct neural responses for distances between dependent words above and below 3 sec (Roll et al., 2013), suggesting different underlying processing mechanisms. Hence, we would hypothesize that dependencies may be best resolved—and thus also learned—within a chunk/processing window. In support of this, dependency distance was found to be minimized across languages (Ferrer-i-Cancho, Gómez-Rodríguez, Esteban, & Alemany-Puig, 2022; Futrell, Mahowald, & Gibson, 2015; for a review, see Liu, Xu, & Liang, 2017), possibly enabling us to resolve dependencies within a temporally constrained processing window. Likewise, syntactic and prosodic boundaries often coincide (for a review, see Wagner & Watson, 2010), and thus, syntactic dependencies may be confined within a prosodic unit. Further research should investigate whether (statistical) learning of nonadjacent dependencies may be constraint within the boundaries of a processing window.

Besides experimental work on human language processing, computational frameworks have been implemented to model the chunking of language input (Yang, Frank, & van den Bosch, 2020; Anderson, Vilares, & Gómez-Rodríguez, 2019; McCauley & Christiansen, 2019). The resulting, automatically detected chunks already perform well in predicting language comprehension and production (e.g., Chalas et al., 2024; Lo et al., 2023; McCauley & Christiansen, 2017). Yet, the present findings on a temporal limit for the duration of multiword chunks for language processing may be useful for further improving the computational models. Adding a temporal limit to the duration of chunks would also make them neurobiologically more plausible, assuming that the time constraint indeed results from electrophysiological processing limitations. However, we note that the incorporation of such a temporal constraint will be challenging as the models operate on text input, where duration is difficult to define. Future research might address this issue by combining text and auditory input for modeling chunking.

In conclusion, we show that a previously proposed temporal constraint on linguistic chunking also influences (artificial) language learning based on TPs. Co-occurrence probabilities between multiple words may only be extracted when their durations fall within a time-limited processing windows for linguistic chunking of ∼2.7 sec. Eventually, our results could lead to identifying an optimal input rate for language learning and processing.

We thank Laura Riedel, Antonia Schmidt, and Finja Klemm for help with stimulus preparation and data collection.

Corresponding author: Lena Henke, MPRG Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103 Leipzig, Germany, or via-email: [email protected].

Data cannot be made publicly available because of ethical permissions and legal restrictions. Aggregated data and code will be made available upon reasonable request to the corresponding author. Stimulus materials can be found at osf.io/wkx4d/.

Lena Henke: Conceptualization; Formal analysis; Methodology; Writing—Original draft; Writing—Review & editing. Lars Meyer: Conceptualization; Formal analysis; Funding acquisition; Methodology; Supervision; Writing—Original draft; Writing—Review & editing.

Max-Planck-Gesellschaft (https://dx.doi.org/10.13039/501100004189), grant number: MPRG Language Cycles.

Retrospective analysis of the citations in every article published in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .407, W(oman)/M = .32, M/W = .115, and W/W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/W = .109, and W/W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance. The authors of this article report its proportions of citations by gender category to be M/M = .448, W/M = .25, M/W = .125, and W/W = .177.

Ahissar
,
E.
,
Nagarajan
,
S.
,
Ahissar
,
M.
,
Protopapas
,
A.
,
Mahncke
,
H.
, &
Merzenich
,
M. M.
(
2001
).
Speech comprehension is correlated with temporal response patterns recorded from auditory cortex
.
Proceedings of the National Academy of Sciences, U.S.A.
,
98
,
13367
13372
. ,
[PubMed]
Anderson
,
M.
,
Vilares
,
D.
, &
Gómez-Rodríguez
,
C.
(
2019
).
Artificially evolved chunks for morphosyntactic analysis
. In
M.
Candito
,
K.
Evang
,
S.
Oepen
, &
D.
Seddah
(Eds.),
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)
(pp. 
133
143
).
Association for Computational Linguistics, Paris, France
.
Anwyl-Irvine
,
A.
,
Dalmaijer
,
E. S.
,
Hodges
,
N.
, &
Evershed
,
J. K.
(
2021
).
Realistic precision and accuracy of online experiment platforms, web browsers, and devices
.
Behavior Research Methods
,
53
,
1407
1425
. ,
[PubMed]
Anwyl-Irvine
,
A.
,
Massonnié
,
J.
,
Flitton
,
A.
,
Kirkham
,
N.
, &
Evershed
,
J. K.
(
2020
).
Gorilla in our midst: An online behavioral experiment builder
.
Behavior Research Methods
,
52
,
388
407
. ,
[PubMed]
Baddeley
,
A. D.
,
Thomson
,
N.
, &
Buchanan
,
M.
(
1975
).
Word length and the structure of short-term memory
.
Journal of Verbal Learning and Verbal Behavior
,
14
,
575
589
.
Batterink
,
L. J.
, &
Paller
,
K. A.
(
2017
).
Online neural monitoring of statistical learning
.
Cortex
,
90
,
31
45
. ,
[PubMed]
Batterink
,
L. J.
, &
Paller
,
K. A.
(
2019
).
Statistical learning of speech regularities can occur outside the focus of attention
.
Cortex
,
115
,
56
71
. ,
[PubMed]
Batterink
,
L. J.
,
Reber
,
P. J.
,
Neville
,
H. J.
, &
Paller
,
K. A.
(
2015
).
Implicit and explicit contributions to statistical learning
.
Journal of Memory and Language
,
83
,
62
78
. ,
[PubMed]
Batterink
,
L. J.
,
Reber
,
P. J.
, &
Paller
,
K. A.
(
2015
).
Functional differences between statistical learning with and without explicit training
.
Learning & Memory
,
22
,
544
556
. ,
[PubMed]
Batterink
,
L. J.
, &
Zhang
,
S.
(
2022
).
Simple statistical regularities presented during sleep are detected but not retained
.
Neuropsychologia
,
164
,
108106
. ,
[PubMed]
Boersma
,
P.
(
2001
).
Praat, a system for doing phonetics by computer
.
Glot International
,
5
,
341
345
.
Bögels
,
S.
,
Schriefers
,
H.
,
Vonk
,
W.
, &
Chwilla
,
D. J.
(
2011a
).
Prosodic breaks in sentence processing investigated by event-related potentials
.
Language and Linguistics Compass
,
5
,
424
440
.
Bögels
,
S.
,
Schriefers
,
H.
,
Vonk
,
W.
, &
Chwilla
,
D. J.
(
2011b
).
The role of prosodic breaks and pitch accents in grouping words during on-line sentence processing
.
Journal of Cognitive Neuroscience
,
23
,
2447
2467
. ,
[PubMed]
Buiatti
,
M.
,
Peña
,
M.
, &
Dehaene-Lambertz
,
G.
(
2009
).
Investigating the neural correlates of continuous speech computation with frequency-tagged neuroelectric responses
.
Neuroimage
,
44
,
509
519
. ,
[PubMed]
Burroughs
,
A.
,
Kazanina
,
N.
, &
Houghton
,
C.
(
2021
).
Grammatical category and the neural processing of phrases
.
Scientific Reports
,
11
,
1
10
.
Buzsáki
,
G.
,
Anastassiou
,
C. A.
, &
Koch
,
C.
(
2012
).
The origin of extracellular fields and currents—EEG, ECoG, LFP and spikes
.
Nature Reviews Neuroscience
,
13
,
407
420
. ,
[PubMed]
Capilla
,
A.
,
Pazo-Alvarez
,
P.
,
Darriba
,
A.
,
Campo
,
P.
, &
Gross
,
J.
(
2011
).
Steady-state visual evoked potentials can be explained by temporal superposition of transient event-related responses
.
PLoS One
,
6
,
e14543
. ,
[PubMed]
Castellanos
,
N. P.
, &
Makarov
,
V. A.
(
2006
).
Recovering EEG brain signals: Artifact suppression with wavelet enhanced independent component analysis
.
Journal of Neuroscience Methods
,
158
,
300
312
. ,
[PubMed]
Chalas
,
N.
,
Meyer
,
L.
,
Lo
,
C.-W.
,
Park
,
H.
,
Kluger
,
D. S.
,
Abbasi
,
O.
, et al
(
2024
).
Dissociating prosodic from syntactic delta activity during natural speech comprehension
.
Current Biology
,
34
,
3537
3549
. ,
[PubMed]
Chen
,
Y. G.
,
Chen
,
X.
,
Kuang
,
C. W.
, &
Huang
,
X. T.
(
2015
).
Neural oscillatory correlates of duration maintenance in working memory
.
Neuroscience
,
290
,
389
397
. ,
[PubMed]
Christiansen
,
M. H.
, &
Chater
,
N.
(
2016
).
The now-or-never bottleneck: A fundamental constraint on language
.
Behavioral and Brain Sciences
,
39
,
e62
. ,
[PubMed]
Conway
,
C. M.
, &
Christiansen
,
M. H.
(
2009
).
Seeing and hearing in space and time: Effects of modality and presentation rate on implicit statistical learning
.
European Journal of Cognitive Psychology
,
21
,
561
580
.
Crowley
,
K. E.
, &
Colrain
,
I. M.
(
2004
).
A review of the evidence for P2 being an independent component process: Age, sleep and modality
.
Clinical Neurophysiology
,
115
,
732
744
.
Dale
,
R.
,
Duran
,
N. D.
, &
Morehead
,
J. R.
(
2012
).
Prediction during statistical learning, and implications for the implicit/explicit divide
.
Advances in Cognitive Psychology
,
8
,
196
209
. ,
[PubMed]
Delorme
,
A.
, &
Makeig
,
S.
(
2004
).
EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis
.
Journal of Neuroscience Methods
,
134
,
9
21
. ,
[PubMed]
Ding
,
N.
(
2023
).
Interpretation and analysis of the steady-state neural response to complex sequential structures: A methodological note
.
arXiv
.
Ding
,
N.
,
Melloni
,
L.
,
Yang
,
A.
,
Wang
,
Y.
,
Zhang
,
W.
, &
Poeppel
,
D.
(
2017
).
Characterizing neural entrainment to hierarchical linguistic units using electroencephalography (EEG)
.
Frontiers in Human Neuroscience
,
11
,
1
9
.
Ding
,
N.
,
Melloni
,
L.
,
Zhang
,
H.
,
Tian
,
X.
, &
Poeppel
,
D.
(
2016
).
Cortical tracking of hierarchical linguistic structures in connected speech
.
Nature Neuroscience
,
19
,
158
164
. ,
[PubMed]
Ding
,
N.
,
Pan
,
X.
,
Luo
,
C.
,
Su
,
N.
,
Zhang
,
W.
, &
Zhang
,
J.
(
2018
).
Attention is required for knowledge-based sequential grouping: Insights from the integration of syllables into words
.
Journal of Neuroscience
,
38
,
1178
1188
. ,
[PubMed]
Donoghue
,
T.
,
Haller
,
M.
,
Peterson
,
E. J.
,
Varma
,
P.
,
Sebastian
,
P.
,
Gao
,
R.
, et al
(
2020
).
Parameterizing neural power spectra into periodic and aperiodic components
.
Nature Neuroscience
,
23
,
12
.
Elbert
,
T.
,
Ulrich
,
R.
,
Rockstroh
,
B.
, &
Lutzenberger
,
W.
(
1991
).
The processing of temporal intervals reflected by CNV-like brain potentials
.
Psychophysiology
,
28
,
648
655
. ,
[PubMed]
Emberson
,
L. L.
,
Conway
,
C. M.
, &
Christiansen
,
M. H.
(
2011
).
Timing is everything: Changes in presentation rate have opposite effects on auditory and visual implicit statistical learning
.
Quarterly Journal of Experimental Psychology
,
64
,
1021
1040
. ,
[PubMed]
Fernandes
,
T.
,
Kolinsky
,
R.
, &
Ventura
,
P.
(
2010
).
The impact of attention load on the use of statistical information and coarticulation as speech segmentation cues
.
Attention, Perception, & Psychophysics
,
72
,
1522
1532
. ,
[PubMed]
Ferrer-i-Cancho
,
R.
,
Gómez-Rodríguez
,
C.
,
Esteban
,
J. L.
, &
Alemany-Puig
,
L.
(
2022
).
Optimality of syntactic dependency distances
.
Physical Review E
,
105
,
014308
. ,
[PubMed]
Fraisse
,
P.
(
1984
).
Perception and estimation of time
.
Annual Review of Psychology
,
35
,
1
37
.
Franco
,
A.
,
Eberlen
,
J.
,
Destrebecqz
,
A.
,
Cleeremans
,
A.
, &
Bertels
,
J.
(
2015
).
Rapid serial auditory presentation: A new measure of statistical learning in speech segmentation
.
Experimental Psychology
,
62
,
346
351
.
Frazier
,
L.
, &
Fodor
,
J. D.
(
1978
).
The sausage machine: A new two-stage parsing model
.
Cognition
,
6
,
291
325
.
Frost
,
R.
,
Armstrong
,
B. C.
, &
Christiansen
,
M. H.
(
2019
).
Statistical learning research: A critical review and possible new directions
.
Psychological Bulletin
,
145
,
1128
1153
. ,
[PubMed]
Futrell
,
R.
,
Mahowald
,
K.
, &
Gibson
,
E.
(
2015
).
Large-scale evidence of dependency length minimization in 37 languages
.
Proceedings of the National Academy of Sciences, U.S.A.
,
112
,
10336
10341
. ,
[PubMed]
Gabard-Durnam
,
L. J.
,
Mendez Leal
,
A. S.
,
Wilkinson
,
C. L.
, &
Levin
,
A. R.
(
2018
).
The Harvard automated processing pipeline for electroencephalography (HAPPE): Standardized processing software for developmental and high-artifact data
.
Frontiers in Neuroscience
,
12
,
97
. ,
[PubMed]
Gerster
,
M.
,
Waterstraat
,
G.
,
Litvak
,
V.
,
Lehnertz
,
K.
,
Schnitzler
,
A.
,
Florin
,
E.
, et al
(
2022
).
Separating neural oscillations from aperiodic 1/f activity: Challenges and recommendations
.
Neuroinformatics
,
20
,
991
1012
. ,
[PubMed]
Ghitza
,
O.
(
2017
).
Acoustic-driven delta rhythms as prosodic markers
.
Language, Cognition and Neuroscience
,
32
,
545
561
.
Ghitza
,
O.
(
2020
).
“Acoustic-driven oscillators as cortical pacemaker”: A commentary on Meyer, Sun & Martin (2019)
.
Language, Cognition and Neuroscience
,
35
,
1100
1105
.
Giraud
,
A.-L.
(
2020
).
Oscillations for all ¯\_(ツ)_/¯? A commentary on Meyer, Sun & Martin (2020)
.
Language, Cognition and Neuroscience
,
35
,
1106
1113
.
Gwilliams
,
L.
(
2020
).
Hierarchical oscillators in speech comprehension: A commentary on Meyer, Sun, and Martin (2019)
.
Language, Cognition and Neuroscience
,
35
,
1114
1118
.
Haegens
,
S.
(
2020
).
Entrainment revisited: A commentary on Meyer, Sun, and Martin (2020)
.
Language, Cognition and Neuroscience
,
35
,
1119
1123
. ,
[PubMed]
Hasson
,
U.
,
Yang
,
E.
,
Vallines
,
I.
,
Heeger
,
D. J.
, &
Rubin
,
N.
(
2008
).
A hierarchy of temporal receptive windows in human cortex
.
Journal of Neuroscience
,
28
,
2539
2550
. ,
[PubMed]
Henin
,
S.
,
Turk-Browne
,
N. B.
,
Friedman
,
D.
,
Liu
,
A.
,
Dugan
,
P.
,
Flinker
,
A.
, et al
(
2021
).
Learning hierarchical sequence representations across human cortex and hippocampus
.
Science Advances
,
7
,
4530
. ,
[PubMed]
Henke
,
L.
,
Lewis
,
A. G.
, &
Meyer
,
L.
(
2023
).
Fast and slow rhythms of naturalistic Reading revealed by combined eye-tracking and electroencephalography
.
Journal of Neuroscience
,
43
,
4461
4469
. ,
[PubMed]
Henke
,
L.
, &
Meyer
,
L.
(
2021
).
Endogenous oscillations time-constrain linguistic segmentation: Cycling the garden path
.
Cerebral Cortex
,
31
,
4289
4299
. ,
[PubMed]
Hwang
,
H.
, &
Steinhauer
,
K.
(
2011
).
Phrase length matters: The interplay between implicit prosody and syntax in Korean “garden path” sentences
.
Journal of Cognitive Neuroscience
,
23
,
3555
3575
. ,
[PubMed]
Inbar
,
M.
,
Grossman
,
E.
, &
Landau
,
A. N.
(
2020
).
Sequences of intonation units form a ∼1 Hz rhythm
.
Scientific Reports
,
10
,
1
9
. ,
[PubMed]
Jin
,
P.
,
Lu
,
Y.
, &
Ding
,
N.
(
2020
).
Low-frequency neural activity reflects rule-based chunking during speech listening
.
eLife
,
9
,
742585
. ,
[PubMed]
Jin
,
P.
,
Zou
,
J.
,
Zhou
,
T.
, &
Ding
,
N.
(
2018
).
Eye activity tracks task-relevant structures during speech and auditory sequence perception
.
Nature Communications
,
9
,
5374
. ,
[PubMed]
Kabdebon
,
C.
,
Pena
,
M.
,
Buiatti
,
M.
, &
Dehaene-Lambertz
,
G.
(
2015
).
Electrophysiological evidence of statistical learning of long-distance dependencies in 8-month-old preterm and full-term infants
.
Brain and Language
,
148
,
25
36
. ,
[PubMed]
Kagerer
,
F. A.
,
Wittmann
,
M.
,
Szelag
,
E.
, &
Steinbüchel
,
N. v.
(
2002
).
Cortical involvement in temporal reproduction: Evidence for differential roles of the hemispheres
.
Neuropsychologia
,
40
,
357
366
. ,
[PubMed]
Keitel
,
A.
,
Gross
,
J.
, &
Kayser
,
C.
(
2018
).
Perceptually relevant speech tracking in auditory and motor cortex reflects distinct linguistic features
.
PLoS Biology
,
16
,
e2004473
. ,
[PubMed]
Keitel
,
C.
,
Quigley
,
C.
, &
Ruhnau
,
P.
(
2014
).
Stimulus-driven brain oscillations in the alpha range: Entrainment of intrinsic rhythms or frequency-following response?
Journal of Neuroscience
,
34
,
10137
10140
. ,
[PubMed]
Kidd
,
E.
, &
Arciuli
,
J.
(
2016
).
Individual differences in statistical learning predict children's comprehension of syntax
.
Child Development
,
87
,
184
193
. ,
[PubMed]
Klimovich-Gray
,
A.
, &
Molinaro
,
N.
(
2020
).
Synchronising internal and external information: A commentary on Meyer, Sun & Martin (2020)
.
Language, Cognition and Neuroscience
,
35
,
1129
1132
.
Kononowicz
,
T. W.
,
Sander
,
T.
, &
van Rijn
,
H.
(
2015
).
Neuroelectromagnetic signatures of the reproduction of supra-second durations
.
Neuropsychologia
,
75
,
201
213
. ,
[PubMed]
Kuiper
,
K.
,
Columbus
,
G.
, &
Schmitt
,
N.
(
2009
).
The acquisition of phrasal vocabulary
. In
S.
Foster-Cohen
(Ed.),
Language acquisition
(pp. 
216
240
).
London
:
Palgrave Macmillan UK
.
Lerner
,
Y.
,
Honey
,
C. J.
,
Silbert
,
L. J.
, &
Hasson
,
U.
(
2011
).
Topographic mapping of a hierarchy of temporal receptive windows using a narrated story
.
Journal of Neuroscience
,
31
,
2906
2915
. ,
[PubMed]
Lewis
,
A. G.
(
2020
).
Balancing exogenous and endogenous cortical rhythms for speech and language requires a lot of entraining: A commentary on Meyer, Sun & Martin (2020)
.
Language, Cognition and Neuroscience
,
35
,
1133
1137
.
Lisman
,
J. E.
, &
Jensen
,
O.
(
2013
).
The theta–gamma neural code
.
Neuron
,
77
,
1002
1016
. ,
[PubMed]
Liu
,
H.
,
Xu
,
C.
, &
Liang
,
J.
(
2017
).
Dependency distance: A new perspective on syntactic patterns in natural languages
.
Physics of Life Reviews
,
21
,
171
193
. ,
[PubMed]
Lo
,
C.-W.
(
2021
).
Testing low-frequency neural activity in sentence understanding
[Thesis].
Lo
,
C.-W.
,
Anderson
,
M.
,
Henke
,
L.
, &
Meyer
,
L.
(
2023
).
Periodic fluctuations in reading times reflect multi-word-chunking
.
Scientific Reports
,
13
,
18522
. ,
[PubMed]
Lo
,
C.-W.
,
Tung
,
T.-Y.
,
Ke
,
A. H.
, &
Brennan
,
J. R.
(
2022
).
Hierarchy, not lexical regularity, modulates low-frequency neural synchrony during language comprehension
.
Neurobiology of Language
,
3
,
538
555
. ,
[PubMed]
Lu
,
Y.
,
Jin
,
P.
,
Pan
,
X.
, &
Ding
,
N.
(
2022
).
Delta-band neural activity primarily tracks sentences instead of semantic properties of words
.
Neuroimage
,
251
,
118979
.
Lukics
,
K. S.
, &
Lukács
,
Á.
(
2021
).
Tracking statistical learning online: Word segmentation in a target detection task
.
Acta Psychologica
,
215
,
103271
. ,
[PubMed]
Luo
,
H.
, &
Poeppel
,
D.
(
2007
).
Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex
.
Neuron
,
54
,
1001
1010
. ,
[PubMed]
Makeig
,
S.
,
Bell
,
A. J.
,
Jung
,
T.-P.
, &
Sejnowski
,
T. J.
(
1996
).
Independent component analysis of electroencephalographic data
. In
D.
Touretzky
,
M.
Mozer
, &
M.
Hasselmo
(Eds.),
Advances in neural information processing systems 8
(pp. 
145
151
).
Cambridge, MA
:
MIT Press
.
Maris
,
E.
, &
Oostenveld
,
R.
(
2007
).
Nonparametric statistical testing of EEG- and MEG-data
.
Journal of Neuroscience Methods
,
164
,
177
190
. ,
[PubMed]
Mates
,
J.
,
Müller
,
U.
,
Radil
,
T.
, &
Pöppel
,
E.
(
1994
).
Temporal integration in sensorimotor synchronization
.
Journal of Cognitive Neuroscience
,
6
,
332
340
. ,
[PubMed]
McCauley
,
S. M.
, &
Christiansen
,
M. H.
(
2017
).
Computational investigations of multiword chunks in language learning
.
Topics in Cognitive Science
,
9
,
637
652
. ,
[PubMed]
McCauley
,
S. M.
, &
Christiansen
,
M. H.
(
2019
).
Language learning as language use: A cross-linguistic model of child language development
.
Psychological Review
,
126
,
1
51
. ,
[PubMed]
Meyer
,
L.
,
Henry
,
M. J.
,
Gaston
,
P.
,
Schmuck
,
N.
, &
Friederici
,
A. D.
(
2016
).
Linguistic bias modulates interpretation of speech via neural delta-band oscillations
.
Cerebral Cortex
,
27
,
4293
4302
. ,
[PubMed]
Meyer
,
L.
,
Sun
,
Y.
, &
Martin
,
A. E.
(
2020a
).
“Entraining” to speech, generating language?
Language, Cognition and Neuroscience
,
35
,
1138
1148
.
Meyer
,
L.
,
Sun
,
Y.
, &
Martin
,
A. E.
(
2020b
).
Synchronous, but not entrained: Exogenous and endogenous cortical rhythms of speech and language processing
.
Language, Cognition and Neuroscience
,
35
,
1089
1099
.
Misyak
,
J. B.
, &
Christiansen
,
M. H.
(
2012
).
Statistical learning and language: An individual differences study
.
Language Learning
,
62
,
302
331
.
Misyak
,
J. B.
,
Christiansen
,
M. H.
, &
Tomblin
,
J. B.
(
2010
).
On-line individual differences in statistical learning predict language processing
.
Frontiers in Psychology
,
1
,
31
. ,
[PubMed]
Moser
,
J.
,
Batterink
,
L.
,
Li Hegner
,
Y.
,
Schleger
,
F.
,
Braun
,
C.
,
Paller
,
K. A.
, et al
(
2021
).
Dynamics of nonlinguistic statistical learning: From neural entrainment to the emergence of explicit knowledge
.
Neuroimage
,
240
,
118378
. ,
[PubMed]
Moulines
,
E.
, &
Charpentier
,
F.
(
1990
).
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
.
Speech Communication
,
9
,
453
467
.
Nelson
,
D. A.
, &
Lassman
,
F. M.
(
1968
).
Effects of intersignal interval on the human auditory evoked response
.
Journal of the Acoustical Society of America
,
44
,
1529
1532
. ,
[PubMed]
Northoff
,
G.
(
2016
).
Slow cortical potentials and “inner time consciousness”—A neuro-phenomenal hypothesis about the “width of present”
.
International Journal of Psychophysiology
,
103
,
174
184
. ,
[PubMed]
Oldfield
,
R. C.
(
1971
).
The assessment and analysis of handedness: The Edinburgh inventory
.
Neuropsychologia
,
9
,
97
113
. ,
[PubMed]
Oostenveld
,
R.
,
Fries
,
P.
,
Maris
,
E.
, &
Schoffelen
,
J.-M.
(
2011
).
FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data
.
Computational Intelligence and Neuroscience
,
2011
,
1
9
. ,
[PubMed]
Ordin
,
M.
,
Polyanskaya
,
L.
,
Soto
,
D.
, &
Molinaro
,
N.
(
2020
).
Electrophysiology of statistical learning: Exploring the online learning process and offline learning product
.
European Journal of Neuroscience
,
51
,
2008
2022
. ,
[PubMed]
Park
,
H.
,
Ince
,
R. A. A.
,
Schyns
,
P. G.
,
Thut
,
G.
, &
Gross
,
J.
(
2015
).
Frontal top–down signals increase coupling of auditory low-frequency oscillations to continuous speech in human listeners
.
Current Biology
,
25
,
1649
1653
. ,
[PubMed]
Peelle
,
J. E.
,
Gross
,
J.
, &
Davis
,
M. H.
(
2013
).
Phase-locked responses to speech in human auditory cortex are enhanced during comprehension
.
Cerebral Cortex
,
23
,
1378
1387
. ,
[PubMed]
Pellicer-Sánchez
,
A.
(
2019
).
Learning single words vs. multiword items
. In
The Routledge handbook of vocabulary studies
.
London
:
Routledge
.
Pereira
,
D. R.
,
Cardoso
,
S.
,
Ferreira-Santos
,
F.
,
Fernandes
,
C.
,
Cunha-Reis
,
C.
,
Paiva
,
T. O.
, et al
(
2014
).
Effects of inter-stimulus interval (ISI) duration on the N1 and P2 components of the auditory event-related potential
.
International Journal of Psychophysiology
,
94
,
311
318
. ,
[PubMed]
Pinto
,
D.
,
Prior
,
A.
, &
Zion Golumbic
,
E.
(
2022
).
Assessing the sensitivity of EEG-based frequency-tagging as a metric for statistical learning
.
Neurobiology of Language
,
3
,
214
234
. ,
[PubMed]
Pöppel
,
E.
(
1997
).
A hierarchical model of temporal perception
.
Trends in Cognitive Sciences
,
1
,
56
61
. ,
[PubMed]
Pöppel
,
E.
(
2009
).
Pre-semantically defined temporal windows for cognitive processing
.
Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences
,
364
,
1887
1896
. ,
[PubMed]
R Core Team
. (
2021
).
R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
. https://www.R-project.org/.
Rimmele
,
J. M.
,
Poeppel
,
D.
, &
Ghitza
,
O.
(
2021
).
Acoustically driven cortical δ oscillations underpin prosodic chunking
.
eNeuro
,
8
, Article 0562-20.2021. ,
[PubMed]
Roll
,
M.
,
Gosselke
,
S.
,
Lindgren
,
M.
, &
Horne
,
M.
(
2013
).
Time-driven effects on processing grammatical agreement
.
Frontiers in Psychology
,
4
,
1004
. ,
[PubMed]
Roll
,
M.
,
Lindgren
,
M.
,
Alter
,
K.
, &
Horne
,
M.
(
2012
).
Time-driven effects on parsing during reading
.
Brain and Language
,
121
,
267
272
. ,
[PubMed]
Saffran
,
J. R.
(
2003
).
Statistical language learning: Mechanisms and constraints
.
Current Directions in Psychological Science
,
12
,
110
114
.
Saffran
,
J. R.
,
Aslin
,
R. N.
, &
Newport
,
E. L.
(
1996
).
Statistical learning by 8-month-old infants
.
Science
,
274
,
1926
1928
. ,
[PubMed]
Santolin
,
C.
, &
Saffran
,
J. R.
(
2018
).
Constraints on statistical learning across species
.
Trends in Cognitive Sciences
,
22
,
52
63
. ,
[PubMed]
Sauseng
,
P.
,
Klimesch
,
W.
,
Gruber
,
W. R.
,
Hanslmayr
,
S.
,
Freunberger
,
R.
, &
Doppelmayr
,
M.
(
2007
).
Are event-related potential components generated by phase resetting of brain oscillations? A critical discussion
.
Neuroscience
,
146
,
1435
1444
. ,
[PubMed]
Schneider
,
J. M.
,
Weng
,
Y.-L.
,
Hu
,
A.
, &
Qi
,
Z.
(
2022
).
Linking the neural basis of distributional statistical learning with transitional statistical learning: The paradox of attention
.
Neuropsychologia
,
172
,
108284
. ,
[PubMed]
Schremm
,
A.
,
Horne
,
M.
, &
Roll
,
M.
(
2016
).
Time-driven effects on processing relative clauses
.
Journal of Psycholinguistic Research
,
45
,
1033
1044
. ,
[PubMed]
Siegelman
,
N.
,
Bogaerts
,
L.
,
Christiansen
,
M. H.
, &
Frost
,
R.
(
2017
).
Towards a theory of individual differences in statistical learning
.
Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences
,
372
,
20160059
. ,
[PubMed]
Siegelman
,
N.
,
Bogaerts
,
L.
, &
Frost
,
R.
(
2017
).
Measuring individual differences in statistical learning: Current pitfalls and possible solutions
.
Behavior Research Methods
,
49
,
418
432
. ,
[PubMed]
Siegelman
,
N.
,
Bogaerts
,
L.
,
Kronenfeld
,
O.
, &
Frost
,
R.
(
2018
).
Redefining “learning” in statistical learning: What does an online measure reveal about the assimilation of visual regularities?
Cognitive Science
,
42
,
692
727
. ,
[PubMed]
Siegelman
,
N.
, &
Frost
,
R.
(
2015
).
Statistical learning as an individual ability: Theoretical perspectives and empirical evidence
.
Journal of Memory and Language
,
81
,
105
120
. ,
[PubMed]
Sonbul
,
S.
, &
Siyanova-Chanturia
,
A.
(
2024
).
Incidental learning of multi-word expressions: Methodological considerations and future directions
. In
Researching incidental vocabulary learning in a second language
.
New York
:
Routledge
.
Stehwien
,
S.
, &
Meyer
,
L.
(
2022
).
Short-term periodicity of prosodic phrasing
. In
Proceedings of Speech Prosody
(pp. 
693
698
).
Steinhauer
,
K.
,
Alter
,
K.
, &
Friederici
,
A. D.
(
1999
).
Brain potentials indicate immediate use of prosodic cues in natural speech processing
.
Nature Neuroscience
,
2
,
191
196
. ,
[PubMed]
Szelag
,
E.
,
Kowalska
,
J.
,
Rymarczyk
,
K.
, &
Pöppel
,
E.
(
2002
).
Duration processing in children as determined by time reproduction: Implications for a few seconds temporal window
.
Acta Psychologica
,
110
,
1
19
. ,
[PubMed]
Szelag
,
E.
,
Stanczyk
,
M.
, &
Szymaszek
,
A.
(
2022
).
Sub- and supra-second timing in auditory perception: Evidence for cross-domain relationships
.
Frontiers in Neuroscience
,
15
,
812533
. ,
[PubMed]
Szelag
,
E.
,
von Steinbüchel
,
N.
,
Reiser
,
M.
,
Gilles de Langen
,
E.
, &
Pöppel
,
E.
(
1996
).
Temporal constraints in processing of nonverbal rhythmic patterns
.
Acta Neurobiologiae Experimentalis
,
56
,
215
225
. ,
[PubMed]
Toro
,
J. M.
,
Sinnett
,
S.
, &
Soto-Faraco
,
S.
(
2005
).
Speech segmentation by statistical learning depends on attention
.
Cognition
,
97
,
B25
B34
. ,
[PubMed]
Turk-Browne
,
N. B.
,
Jungé
,
J. A.
, &
Scholl
,
B. J.
(
2005
).
The automaticity of visual statistical learning
.
Journal of Experimental Psychology: General
,
134
,
552
. ,
[PubMed]
Ulbrich
,
P.
,
Churan
,
J.
,
Fink
,
M.
, &
Wittmann
,
M.
(
2007
).
Temporal reproduction: Further evidence for two processes
.
Acta Psychologica
,
125
,
51
65
. ,
[PubMed]
van den Oord
,
A.
,
Dieleman
,
S.
,
Zen
,
H.
,
Simonyan
,
K.
,
Vinyals
,
O.
,
Graves
,
A.
, et al
(
2016
).
WaveNet: A generative model for raw audio
.
arXiv
.
Vetchinnikova
,
S.
,
Konina
,
A.
,
Williams
,
N.
,
Mikušová
,
N.
, &
Mauranen
,
A.
(
2023
).
Chunking up speech in real time: Linguistic predictors and cognitive constraints
.
Language and Cognition
,
15
,
453
479
.
Vollrath
,
M.
,
Kazenwadel
,
J.
, &
Krüger
,
H. P.
(
1992
).
A universal constant in temporal segmentation of human speech—A reply to Schleidt and Feldhütter (1989)
.
Naturwissenschaften
,
79
,
479
480
. ,
[PubMed]
Wagner
,
M.
, &
Watson
,
D. G.
(
2010
).
Experimental and theoretical advances in prosody: A review
.
Language and Cognitive Processes
,
25
,
905
945
. ,
[PubMed]
Wang
,
L.
,
Lin
,
X.
,
Zhou
,
B.
,
Pöppel
,
E.
, &
Bao
,
Y.
(
2015
).
Subjective present: A window of temporal integration indexed by mismatch negativity
.
Cognitive Processing
,
16
(
Suppl. 1
),
131
135
. ,
[PubMed]
Wen
,
H.
, &
Liu
,
Z.
(
2016
).
Separating fractal and oscillatory components in the power spectrum of neurophysiological signal
.
Brain Topography
,
29
,
13
26
. ,
[PubMed]
Winkler
,
I.
,
Debener
,
S.
,
Muller
,
K.-R.
, &
Tangermann
,
M.
(
2015
).
On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP
. In
37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
(pp. 
4101
4105
).
Winkler
,
I.
,
Haufe
,
S.
, &
Tangermann
,
M.
(
2011
).
Automatic classification of artifactual ICA-components for artifact removal in EEG signals
.
Behavioral and Brain Functions
,
7
,
30
. ,
[PubMed]
Wittmann
,
M.
(
2011
).
Moments in time
.
Frontiers in Integrative Neuroscience
,
5
,
66
.
Yang
,
J.
,
Frank
,
S. L.
, &
van den Bosch
,
A.
(
2020
).
Less is better: A cognitively inspired unsupervised model for language segmentation
. In
M.
Zock
,
E.
Chersoni
,
A.
Lenci
, &
E.
Santus
(Eds.),
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
(pp. 
33
45
).
Association for Computational Linguistics
.
Zoefel
,
B.
,
Archer-Boyd
,
A.
, &
Davis
,
M. H.
(
2018
).
Phase entrainment of brain oscillations causally modulates neural responses to intelligible speech
.
Current Biology
,
28
,
401
408
. ,
[PubMed]
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.