Breaking the linguistic code requires the extraction of at least two types of information from the speech signal: the relations between linguistic units and their sequential position. Furthermore, these different types of information need to be integrated into a coherent representation of language structure. The brain networks responsible for these abilities are well known in adults, but not in young infants. Our results show that the neural architecture underlying these abilities is operational at birth. In three optical imaging studies, we found that the newborn brain detects identity relations, as evidenced by enhanced activation in the bilateral superior temporal and left inferior frontal regions. More importantly, the newborn brain can also determine whether such identity relations hold for the initial or final positions of speech sequences, as indicated by increased activity in the inferior frontal regions, possibly Broca's area. This implies that the neural foundations of language acquisition are in place from birth.
Human languages string together sounds into a linear stream of speech to express complex meanings. To understand these, listeners have to simultaneously extract different types of information from the signal. They need to identify linguistic units, their sequential positions, as well as the relationships that hold between them, and they need to bind these together. Adults and older infants have some remarkable abilities to acquire and process the structural properties of language (Gervain & Mehler, 2010; Friederici, 2002; Marcus, Vijayan, Rao, & Vishton, 1999; Saffran, Aslin, & Newport, 1996). Indeed, in a seminal paper, Marcus et al. (1999) showed that 7-month-old infants were able to learn and generalize structural regularities based on the identity relation, successfully discriminating between artificial grammars with ABB (e.g., “wo fe fe”), AAB and ABA structures. Relatively little is known, however, about when these abilities first emerge, what perceptual and learning mechanisms they involve, and what neural systems support them. Therefore, we investigated whether, at birth, babies are sensitive to sequential position in speech and whether they are able to integrate this information with other structural patterns. To identify the brain areas involved, we used near-infrared spectroscopy (NIRS).
Newborns have sophisticated speech perception abilities. They prefer human speech to complex speech analogues (Vouloumanos & Werker, 2007). They distinguish and prefer the language spoken by their mothers during pregnancy over other languages (Moon, Cooper, & Fifer, 1993; Mehler et al., 1988) and show larger left hemispheric brain activity to this language played forward than backward (Pena et al., 2003). They can detect the acoustic cues that signal word boundaries (Christophe, Dupoux, Bertoncini, & Mehler, 1994), discriminate words with different patterns of lexical stress (Sansavini, Bertoncini, & Giovanelli, 1997), and distinguish between function words (articles, pronouns, prepositions, determiners, etc.) and content words (nouns, verbs, adjectives, adverbs, etc.) on the basis of their different acoustic characteristics (Shi, Werker, & Morgan, 1999).
Newborns have also been shown to possess basic abilities to process structural regularities in speech. They are able to discriminate simple structures based on immediate repetitions of identical syllables (ABB: “mubaba”) from random sequences (ABC: “mubage”), as indicated by an increased neural response to ABB structures as compared with ABC structures in the temporal and frontal brain areas bilaterally but with a stronger involvement of the left hemisphere (LH; Gervain, Macagno, Cogoi, Pena, & Mehler, 2008). However, newborns' ability to discriminate ABB from ABC patterns is compatible with at least two processing mechanisms. One mechanism allows the simultaneous encoding and integration of multiple features of the speech sequences—minimally, the identity relation and its position. Alternatively, a mechanism that only detects one feature, the repetition but not its position, could be sufficient.
To decide whether young learners possess the simple repetition detector or the more complex repetition–position integrating mechanism, we explored whether neonates could discriminate simple repetition-based grammars that only differed in the sequential position of the repetition. We conducted three NIRS studies, in which we measured newborn infants' brain responses to different repetition-based artificial grammar stimuli and their controls. In Experiment 1, we explored whether newborns could discriminate sequence-initial repetitions (AAB) from random sequences (ABC) to test whether the ability to detect identity relations (Gervain et al., 2008) generalizes to the initial position. In Experiment 2, we tested the neonates' ability to discriminate between initial and final repetitions (AAB vs. ABB). In Experiment 3, we contrasted the salience of the two positions, testing whether either one is favored over the other.
In Experiment 1, we sought to establish that neonates were able to discriminate sequence-initial repetitions (AAB: “babamu”) from random sequences (ABC: “mubage”), enabling any subsequent comparisons between sequence-initial and sequence-final repetitions.
The two artificial grammars used in the experiments, that is, AAB and ABC, generated trisyllabic words. Both grammars used the same syllabic repertoire, containing 20 consonant–vowel syllables (“ba,” “bi,” “du,” “ge,” “pe,” “pi,” “ta,” “to,” “ko,” “ku,” “lo,” “lu,” “mu,” “na,” “fi,” “fe,” “sha,” “sho,” “ze,” and “zi”). The syllables were organized into syllable pairs. A syllable pair was defined as two syllables containing the same consonant but a different vowel (“ba”–“bi”) or at least consonants from the same class (e.g., nasal) and a different vowel (“mu”–“na”).
The material was constructed as follows. Half of the syllables were designated X syllables, and the other half, Y syllables. The two categories were established, such that one member of a syllable pair was assigned to category X, and the other, to category Y. For the AAB grammar, X syllables were used as the initial repeated syllable in half of the blocks, whereas Y syllables provided the unrepeated third syllables, and inversely in the other half of the blocks. Thus, each syllable appeared in each sequential position with equal frequency. In addition, each block used different pairings of A and B syllables. To maximize discriminability, two constraints were observed when pairing up A and B syllables: They could (1) not contain the same vowel nor (1) come from the same syllable pair. This resulted in seven possible words for each initial syllable, yielding 140 words. Thus, the 14 blocks exhausted all possible combinations without requiring words to be repeated. The ABC words were derived from the repetition words by shuffling around the repeated third syllables of the words within a block.
Words were synthesized using the fr4 French female voice of the MBROLA diphone database. Syllables were 270-msec long (consonant: 120 msec, vowel: 150 msec) and had a monotonous pitch of 200 Hz.
As a consequence of the design, the repetition grammar was matched to the random grammar on (1) the overall frequency of all syllables, (2) the frequency of each syllable in each sequential position, and (3) all phonological and prosodic characteristics. Additionally, the distribution of transitional probabilities was also equated by keeping the TPs as high between certain designated BC syllables, as they were between the repeated syllables.
Within blocks, words were separated by pauses of varying length (0.5–1.5 sec), yielding blocks of about 18 sec (Figure 1A). Blocks were also spaced at time intervals of varying duration (25–35 sec) to not induce phase-locked brain responses. The 28 blocks were presented in an interleaved fashion. We used a simple block design; the order of the blocks was pseudorandomized and counterbalanced across participants (Figure 1A). At most two consecutive blocks of the same condition were allowed.
Twenty-two healthy, full-term neonates (9 girls; mean age = 1.14 days, range = 0–3 days; Apgar score ≥ 8) born in the Vancouver area participated in Experiment 1. Data from 13 additional infants were collected but excluded from the data analysis, as they (1) failed to complete the experiment because of fussiness and crying (11 infants) or (2) provided poor quality data because of large motion artifacts or thick hair (2 infants). All parents gave informed consent before participation. The ethics boards of the University of British Columbia and BC Women's Hospital, where the experiments took place, granted permission.
Infants were tested with a HITACHI ETG-4000 NIRS machine (source detector separation: 3 cm; two continuous wavelengths of 695 and 830 nm) in a dimly lit, sound-attenuated room at BC Women's Hospital, lying in their cribs throughout the test session of 22–25 min. At least one parent was present at all times. Babies were tested while in a state of quiet rest or sleep. The NIRS optical probes were placed on infants' heads bilaterally (12 channels per hemisphere; Figure 1C) using the tragus, the vertex, and the ears as surface landmarks (Gervain et al., 2008; Pena et al., 2003).
Sound stimuli were administered through two loudspeakers positioned at a distance of 1.5 m from the babies' heads, at an angle of 30°, and elevated to the same height as the infants' cribs. A portable Macintosh computer played the stimuli and operated the NIRS machine, running the PsyScope experimental software. The NIRS machine used a 0.7-mW laser power.
Data Processing and Analysis
Changes in the concentration of oxygenated hemoglobin (oxyHb) and deoxygenated hemoglobin (deoxyHb) were calculated from the absorption of near-infrared light as metabolic indicators of neural activity. OxyHb and deoxyHb were entered into the data analysis.
To eliminate high-frequency noises (heartbeat, etc.) and overall trends, the data were band-pass filtered between 0.01 and 0.7 Hz. Movement artifacts, defined as concentration changes larger than 0.1 mmol × mm over 0.2 msec, were removed by rejecting block–channel pairs where artifacts occurred. For the nonrejected blocks, a baseline was linearly fitted between the means of the 5 sec preceding the onset of the block and the 5 sec starting 32 sec after the onset of the block (18 sec of stimulation plus 15 sec of resting period). For each block, data were averaged over an 18-sec time window corresponding to the stimulation.
We statistically analyzed the data by creating channelwise t maps as well as by conducting ANOVAs over specific ROIs. Two ROIs were defined (Figure 1C): the bilateral temporal areas (superior temporal regions; LH: Channels 3 and 6, right hemisphere [RH]: Channels 17 and 19), known to be responsible for linguistic and nonlinguistic auditory processing in adults and infants (Dehaene-Lambertz, Hertz-Pannier, Dubois, & Dehaene, 2008; Pena et al., 2003; Dehaene-Lambertz, Dehaene, & Hertz-Pannier, 2002; Friederici, Steinhauer, & Pfeifer, 2002), and the bilateral frontal areas (inferior frontal regions; LH: Channels 2 and 5, RH: Channels 13 and 15), proposed to be involved in pattern extraction, higher-order linguistic processing, and verbal memory trace formation in adults and infants (Dehaene-Lambertz et al., 2002, 2006; Friederici, Ruschemeyer, Hahne, & Fiebach, 2003; Friederici, 2002; Dehaene-Lambertz, 2000).
The oxyHb and deoxyHb concentrations were averaged across blocks for each condition. The resulting grand average across all infants is shown in Figure 2A. As a first analysis, we constructed t maps comparing the concentration changes of oxyHb and deoxyHb for the two conditions in each channel (Figure 2B and C). We obtained significantly higher activation for the AAB than for the ABC condition in Channels 3, 4, 6, and 15 with oxyHb and Channels 5, 7, and 15 with deoxyHb (p < .05, uncorrected). These results indicate that the two structures are discriminated in the temporal and frontal areas. A stronger involvement of the LH can also be observed, with three channels showing significant discrimination for oxyHb and two channels for deoxyHb in the LH, as compared with only one per hemoglobin species in the RH.
In a second analysis, we performed a repeated measures ANOVA with factors Condition (AAB/ABC), Hemisphere (LH/RH), and ROI (temporal/frontal) using oxyHb as the dependent variable. We obtained a main effect of Condition (F(1, 21) = 5.005, p = .036) because of greater overall activation in response to the AAB than to the ABC grammar, in line with the results of the channel-by-channel analysis. We also found a significant Condition × Hemisphere interaction (F(1, 21) = 5.227, p = .033), with a greater involvement of the LH in the processing of AAB, but not ABC stimuli, confirming the results of the channel-by-channel comparison (Scheffe's post hoc test: AAB > ABC in LH, p = .0006). A similar three-way repeated measures ANOVA with deoxyHb as the dependent variable revealed a significant main effect of ROI (F(1, 21) = 5.945, p = .024) because of greater activation in the temporal than in the frontal areas.
These results suggest that the newborn brain can distinguish structures based on initial repetitions from random but otherwise similar control sequences. This discrimination is most pronounced in the temporal areas of the LH, consistently with previous evidence about the left lateralization of language in adults (Kimura, 1967) and infants (Telkemeyer et al., 2009; Pena et al., 2003; Dehaene-Lambertz et al., 2002). Indeed, the overall pattern of results, that is, strong temporal and weaker but significant frontal discrimination predominantly in the LH, is similar to what was previously found for the discrimination between final repetitions and random sequences (ABB vs. ABC) in newborns (Gervain et al., 2008).
Neonates are thus able to discriminate both sequence-initial and sequence-final repetitions from random sequences. However, whether they only encode the identity relation or the identity and its position is not clear from the above results. Because language unfolds over time, sequential position is a basic feature of language structure. Indeed, languages use sequential order to distinguish the participants of an action (Katie saw John vs. John saw Katie), questions from declarative statements (Is he tall? vs. He is tall), different lexical meanings (pan vs. nap), and so forth. Edge positions play a particularly important role (Endress, Nespor, & Mehler, 2009). They are privileged sites for different linguistic processes. Word beginnings, for instance, are relevant for recognizing words and retrieving them from the mental lexicon (Swingley, 2005; Redford & Diehl, 1999). Morphological processes like affixation and case marking also preferentially target the edges, typically the ends of words (Dryer, 1992; Greenberg, 1978).
In Experiment 2, we therefore tested whether 1-day-old infants could discriminate AAB and ABB sequences.
The AAB grammar used in the experiment was identical to Experiment 1. For the ABB grammar, X syllables served as the initial unrepeated syllable in half of the blocks and Y syllables served as the repeated one, and vice versa in the other half of the blocks. The two grammars were thus matched to each other on all nonstructural properties.
We used an alternating/non-alternating design (Figure 1B). Alternating/non-alternating designs are often used with infants in behavioral (Maye, Werker, & Gerken, 2002; Best & Jones, 1998) and NIRS (Sato, Sogabe, & Mazuka, 2010) studies to test fine-grained discrimination. If infants are able to distinguish between the two types of stimuli, then they will perceive blocks in which the two stimulus types alternate as different from blocks in which only one type of stimulus is presented. The alternating and non-alternating blocks strictly alternated, with half of the infants hearing an alternating block first and the other half hearing a non-alternating block first. In half of the alternating blocks, the first sequence was an ABB item, and in the other half, an AAB item. For half of the infants who heard an alternating block first, this block started with an AAB item, and for the other half, with an ABB item. For half of the infants who heard a non-alternating block first, this block was an AAB block, and for the other half, an ABB block. Their order was randomized and counterbalanced across infants.
A new group of twenty healthy, full-term neonates (15 girls; mean age = 1.05 days, range = 0–3 days; Apgar score ≥ 8) born in the Vancouver area participated in Experiment 2. Data from 11 additional infants were collected but excluded from the data analysis, as they (1) failed to complete the experiment because of fussiness and crying (six infants) or (2) provided poor quality data because of large motion artifacts or thick hair (five infants). All parents gave informed consent before participation. The ethics boards of the University of British Columbia and BC Women's Hospital, where the experiments took place, granted permission.
The procedure was identical to Experiment 1.
Data Processing and Analysis
Data processing and analysis were identical to Experiment 1.
The grand-averaged results of Experiment 2 are shown in Figure 3A. The channel-by-channel t map of the two conditions (Figure 3B and C) revealed a significantly greater activation in response to the non-alternating blocks in Channels 2, 4, 19, and 21 (p < .05, uncorrected) as well as a trend toward significantly greater activation in response to non-alternating blocks in Channels 5 and 9 (p < .1), using oxyHb as the dependent measure. When deoxyHb was used as the dependent variable, Channels 3, 4, 13, 15, 16, and 17 (p < .05) exhibited greater activation, that is, stronger deflation, for non-alternating trials. These findings indicate that the two conditions are bilaterally discriminated in the two ROIs, with more channels differentially responding in the frontal than in the temporal areas.
A repeated measures ANOVA with factors Condition (alternating/non-alternating), Hemisphere (LH/RH), and ROI (temporal/frontal) using oxyHb as the dependent variable revealed a main effect of Condition (F(1, 19) = 5.529, p = .030), as non-alternating blocks elicited greater neural activity than alternating blocks. A significant main effect of ROI was also observed (F(1, 19) = 7.070, p = .016) because of greater general activation in the temporal than in the frontal ROI. A similar ANOVA using deoxyHb as the dependent variable revealed a highly significant main effect of Condition (F(1, 19) = 9.162, p = .007) because of increased activation for non-alternating as compared with alternating blocks. The main effect of ROI was marginally significant (F(1, 19) = 4.089, p = .057), showing a somewhat greater activation in the temporal than in the frontal ROI.
These results show that newborns can discriminate the AAB and ABB patterns, as they differently respond to either of these sequences than to their mixed presentation. Importantly, stimuli were randomly jittered (separated by silences of varying duration) within blocks, rendering the exact occurrence of repetitions unpredictable in both alternating and non-alternating blocks. Therefore, a simple difference in the periodic recurrence of repetitions between the two conditions (e.g., evenly spaced in the nonalternating blocks, whereas more variable in the alternating blocks) cannot account for the observed brain responses.
These findings suggest that neonates are able to distinguish the initial versus final position of the repetition. In fact, newborns responded more to the non-alternating than to the alternating blocks, suggesting that they were able to extract the repetition and its position whenever they were unambiguously available. This resulted in increased activity and higher metabolic costs in the nonalternating than in the alternating blocks, wherein no unambiguous regularity could be extracted.
Discriminating two repetition-based patterns that differ only in the position of the repetition necessarily requires the extraction and binding of at least two features of the input: positional and relational information. The spatial localization of the differential response confirms the presence of such an integrative mechanism. The differential response in Experiment 2 was observed in more of the frontal channels than in Experiment 1. The frontal channels covered the inferior frontal regions (possibly involving Broca's area—although in the absence of structural brain scans, this cannot completely be ascertained). These areas have been reported to be responsible for sequence learning and phrase- or sentence-level integrative processes (Dehaene-Lambertz et al., 2006).
Does the ability to discriminate the initial and final positions result from a preferential processing of one edge position over the other? In language, the two positions are often argued to play different roles (Bybee, Pagliuca, & Perkins, 1990; Cutler, Hawkins, & Gilligan, 1985). The initial position is typically more informative about lexical identity and is preferentially recruited for lexical access and retrieval, that is, the identification of entries in the mental lexicon (Cutler & Clifton, 1999). The final position, by contrast, is more often involved in morphological processes. Indeed, left–right asymmetries favoring the right edge have been observed in phonology, morphology, and syntax (Cinque, 2009; Hawkins & Gilligan, 1988; Cutler et al., 1985). For instance, suffixation is strongly preferred over prefixation (Dryer, 1992; Greenberg, 1978), with certain morphological phenomena, such as case marking, being expressed through suffixation in about 90% of the languages examined (Dryer, 2008). Indeed, some accounts of language processing (Cutler et al., 1985) argue that morphosyntactic phenomena often target the final position, precisely because the temporal unfolding of speech renders the initial position particularly salient for word recognition. It is, therefore, possible that one of the positions might be more salient or easier to process than the other. In fact, some languages even exhibit specific constraints preferentially assigning repetitions to sequence-final positions. Semitic languages, for instance, allow the repetition of the final, but not of the initial, consonant in their lexical roots (Hebrew: smm “drug,” but *ssm; Berent, Vaknin, & Shimron, 2004; Berent & Shimron, 2003; McCarthy, 1986). It is thus interesting to test whether one edge position is favored over the other at birth.
In Experiment 3, we directly contrasted the repetition-initial and repetition-final grammars to compare their processing costs.
The same ABB and AAB grammars were used as in Experiment 2, except that the ABB and AAB blocks were presented in a simple block design, as was used in Experiment 1 (Figure 1A).
A new group of 24 healthy, full-term neonates (14 girls; mean age = 1.5 days, range = 0–3 days; Apgar score ≥ 8) born in the Vancouver area participated in Experiment 3. Data from 11 additional infants were collected but excluded from the data analysis as they (1) failed to complete the experiment because of fussiness and crying (eight infants) or (2) provided poor quality data because of large motion artifacts or thick hair (three infants). All parents gave informed consent before participation. The ethics boards of the University of British Columbia and BC Women's Hospital, where the experiments took place, granted permission.
The three groups of newborns in Experiments 1–3 did not differ in their age [one-factor ANOVA with between-subject factor Experiment (Exp 1/Exp 2/Exp 3), F(2, 63) = 1.577, p = ns].
The procedure was identical to Experiments 1 and 2.
Data Processing and Analysis
Data processing and analysis were identical to Experiments 1 and 2.
The grand-averaged results of Experiment 3 are illustrated in Figure 4A. The channel-by-channel t map revealed greater activation in response to AAB than to ABB in Channel 8 (p < .05) for oxyHb and in Channels 2, 5, and 19 (p < .05) for deoxyHb, suggesting a small advantage for the initial position (Figure 4B and C).
In a three-way ANOVA with factors Condition (AAB/ABB), Hemisphere (LH/RH), and ROI (temporal/frontal) with oxyHb, we obtained a marginally significant main effect of ROI (F(1, 23) = 3.984, p = .058) because of a somewhat larger overall activation in the temporal than in the frontal areas. A similar ANOVA for deoxyHb yielded a highly significant main effect for ROI (F(1, 23) = 8.467, p = .008), confirming that the temporal areas showed more activation than the frontal ones. Crucially, the factor Condition did not yield a significant main effect or interaction, suggesting that there was no difference between the processing costs of the two grammars.
To further explore the processing demands of the two grammars, we conducted the same analysis on the nonalternating blocks of Experiment 2 (Figure 5). These are identical to the blocks of Experiment 3, except that there were half as many of them per condition and they were presented in an interleaved fashion with alternating blocks. (Two infants provided less than two valid, noise- and artifact-free trials in one or both types of non-alternating blocks, so their data were not included in the analysis.) The channel-by-channel analysis revealed no significant difference between conditions in any of the channels for oxyHb. For deoxyHb, a greater decrease for AAB than for ABB was observed in Channel 8 (p < .01), but no other significant differences were observed.
In a three-way ANOVA with factors Condition (AAB/ABB), Hemisphere (LH/RH), and ROI (temporal/frontal) with oxyHb, there was a marginally significant main effect of ROI (F(1, 17) = 0.029, p = .055), reflecting greater activation in the temporal as compared with the frontal areas. In a similar analysis with deoxyHb, there were no significant effects.
The channel-by-channel comparisons for Experiment 3 and for the non-alternating blocks of Experiment 2 showed a slight advantage for initial repetition, which, however, did not reach significance in the ANOVAs. Furthermore, in both experiments, the temporal areas showed greater activation than the frontal ones. This affirms that the stimuli were being processed, but neither structure was advantaged. This result is compatible with the fact that both positions play significant, albeit different, roles in language perception and processing. Indeed, both edge positions are privileged over sequence-middle positions as targets of linguistic processes (Dryer, 1992). One possible explanation for this fact might come from memory constraints on sequence learning. It has long been observed that the initial and final elements of sequences are better processed and remembered than the medial elements in language (Endress et al., 2009) as well as in other domains (Ng & Maybery, 2002).
In three NIRS experiments, we tested whether newborns were able to encode two fundamental aspects of language structure: relational and sequential information. We found that the newborn brain successfully encodes both types of information and integrates them into a coherent structural pattern (Experiments 1 and 2). We found no important differences in processing cost for the initial and final positions (Experiment 3). The three experiments identified responses with different spatial distributions to different aspects of speech. These findings mesh well with the limited evidence that exists regarding the neural correlates of speech processing in the developing brain. Left-lateralized brain activity in the superior temporal regions has been observed in response to speech stimuli in young infants in a number of studies (Dehaene-Lambertz et al., 2002, 2006; Pena et al., 2003). Furthermore, activation in the inferior frontal gyrus (Broca's area) has been reported to be associated with integrative processes over larger speech units and memory for repeated sentence presentation in 3-month-old infants (Dehaene-Lambertz et al., 2006). Our findings, thus, demonstrate that the frontal and temporal areas of the neonate brain might be part of a functional network subserving speech and language processing.
It is important to note that we only used speech stimuli in our experiments. Therefore, our results cannot determine whether these brain mechanisms are specific to language or more general to other auditory stimuli. This important question awaits further investigation.
Taken together, these results imply that human infants already possess some of the perceptual and combinatorial abilities that are required for efficient language acquisition at birth. The ability to integrate positional with relational information and to discriminate different sequential positions shows preparation at birth for efficient tracking of both lexical and morphosyntactic information, laying a foundation for the acquisition of the grammar and the vocabulary of the native language.
This research was funded by a Consortium grant 220020096 entitled “Program grant to develop near-infrared spectroscopy in combination with ERPs and fMRI to assess cognitive development in human infants and young children” from the McDonnell Foundation (to J. F. W. and J. G.), NSREC grant 81103 (to J. F. W.), NIH grant R01DC003277 (to I. B.), and ANR Jeune Chercheur as well as a Fyssen Foundation Startup Grant (to J. G.).
Reprint requests should be sent to Judit Gervain, Laboratoire Psychologie de la Perception (UMR 8158), Centre National de la Recherche Scientifique, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris, France, or via e-mail: firstname.lastname@example.org.