Learning a new word requires discrimination between a novel sequence of sounds and similar known words. We investigated whether semantic information facilitates the acquisition of new phonological representations in adults and whether this learning enhancement is modulated by overnight consolidation. Participants learned novel spoken words either consistently associated with a visual referent or with no consistent meaning. An auditory oddball task tested discrimination of these newly learned phonological forms from known words. The MMN, an electrophysiological measure of auditory discrimination, was only elicited for words learned with a consistent semantic association. Immediately after training, this semantic benefit on auditory discrimination was linked to explicit learning of the associations, where participants with greater semantic learning exhibited a larger MMN. However, although the semantic-associated words continued to show greater auditory discrimination than nonassociated words after consolidation, the MMN was no longer related to performance in learning the semantic associations. We suggest that the provision of semantic systematicity directly impacts upon the development of new phonological representations and that a period of offline consolidation may promote the abstraction of these representations.
Learning new words is an ability that persists into adulthood. A critical feature of learning a new spoken word is the development of a sufficiently well-specified phonological representation to allow discrimination from similar-sounding existing words. Phonological specification of a new spoken word is hence an early yet critical component of the full acquisition process (Page & Norris, 2009; Baddeley, Gathercole, & Papagno, 1998; Papagno & Vallar, 1992). Although learning new phonological form representations can occur rapidly in the brain (Shtyrov, Nikulin, & Pulvermüller, 2010), it remains relatively unknown what factors modulate the acquisition of new words in adulthood. Importantly, as influential models of human spoken word recognition argue for an interaction between phonology and meaning in recognizing known words (Gaskell & Marslen-Wilson, 1997, 2002; McClelland & Elman, 1986), and as these links between phonology and meaning must at some point have been acquired, one possibility is that meaning also facilitates the learning of new phonological form representations. Furthermore, although overnight consolidation has been established as an important factor in some aspects of word learning including lexical integration (e.g., Gaskell & Dumay, 2003) and generalization (e.g., Tamminen, Davis, Merkx, & Rastle, 2012), it is unknown how consolidation might impact on the development of lower-level phonological form representations or how this influence might be modulated by the provision of systematic semantic information during learning. Here we use the exquisite temporal precision of ERPs to identify the exact moment at which participants distinguish newly learned spoken words from known words. Combining this with a novel learning paradigm enabled us to investigate two possible influences on acquiring a new phonological form representation: (i) the role of offline consolidation and (ii) the provision of systematic semantic information.
It is well known that offline consolidation, possibly related to sleep, can improve perceptual and motor abilities learned during wake (e.g., Korman et al., 2007; Karni, Tanne, Rubenstein, Askenasy, & Sagi, 1994). Recent years have seen great interest in the related possibility that consolidation may play a critical role in some aspects of word learning. In a series of studies on the integration of novel spoken words into the mental lexicon, Gaskell and Dumay (2003) demonstrated that newly learned words (e.g., cathedruke) can come to compete with similar existing words (e.g., cathedral), but only if the initial learning phase is followed by a period of offline consolidation (see also Bowers, Davis, & Hanley, 2005, for an analogous study using visual presentation). Sleep appears to provide an optimal state for these consolidation processes (Tamminen, Payne, Stickgold, Wamsley, & Gaskell, 2010; Dumay & Gaskell, 2007), but the integration of novel words into the mental lexicon is also possible during wakefulness under certain conditions (Lindsay & Gaskell, 2013; Szmalec, Page, & Duyck, 2012; Fernandes, Kolinsky, & Ventura, 2009). In addition to lexical integration processes, consolidation also appears to be critical for the abstraction of newly learned information, in such a way that promotes linguistic generalization. For example, Gomez, Bootzin, and Nadel (2006) demonstrated that infants who took a nap after a spoken learning task were able to extract an abstract rule relating elements in the training set that could be applied to untrained stimuli in a manner that infants who failed to nap could not. Similarly, Tamminen et al. (2012) showed that adults who learned a series of words with an internal morphological structure (e.g., teachnule, buildnule, sleepnule) could apply their knowledge of the element [-nule] to untrained stimuli, but only following a period of overnight consolidation. These types of effects have been characterized within a complementary learning systems (CLS) account (e.g., Davis & Gaskell, 2009; McClelland, McNaughton, & O'Reilly, 1995), suggesting that newly learned words are initially stored as distinct episodic representations and that one function of consolidation may be to transfer these episodic representations to abstract lexical representations. However, although consolidation appears to be very significant in these higher-level word learning processes, it is unknown how consolidation may impact on lower-level phonological form learning processes (e.g., Shtyrov et al., 2010) that are a necessary prerequisite.
Previous investigations of the influence of semantic information on novel word learning are far less consistent. In studies testing explicit memory for learned whole words (e.g., Rueckl & Olds, 1993), the provision of meaning has been shown to be broadly advantageous. Associative learning between a word and visual referent (Breitenstein et al., 2005), semantic richness of implicitly learned words (Rabovsky, Sommer, & Abdel Rahman, 2012), and semantic relatedness of new word meanings (Rodd et al., 2012) have a beneficial impact on measures of word recall and recognition memory. However, this consistently beneficial effect of semantic information on explicit measures of word learning does not always translate to measures of online lexical processing such as speeded naming (e.g., Hultén, Vihla, Laine, & Salmelin, 2009; Sandak et al., 2004). Furthermore, it is difficult to reconcile with the lexical integration literature. Dumay, Gaskell, and Feng (2004) trained participants on novel words (e.g., cathedruke), which were presented in either a meaningful sentential context or in isolation in a phoneme monitoring task. Although novel words introduced in both conditions came to compete with existing words (suggesting lexical integration), those introduced in a sentential context required a longer period of consolidation to do so. Similarly, Takashima, Bakker, van Hell, Janzen, and McQueen (2014) trained participants on a set of novel spoken words, half of which were associated with a picture. They observed that only those trained without a picture engaged in lexical competition the subsequent day. Conversely, Henderson, Weighall, and Gaskell (2013) found that words trained in both semantic and nonsemantic contexts yielded competition effects in children the day after learning. Similarly, although some studies have reported that the provision of semantic information is necessary to achieve generalization in adult word learning paradigms (Tamminen et al., 2012; Merkx, Rastle, & Davis, 2011, both for morphological rule learning), others have reported generalization effects even in the absence of semantic information (Taylor, Plunkett, & Nation, 2011, in the case of artificial orthography learning). Overall, then, although the provision of semantic information has a strong influence on explicit memory for learned words, the findings regarding higher-level word learning processes such as lexical integration and generalization are much less clear. In examining how the provision of semantic information influences the acquisition of lower-level phonological form representations before and after consolidation, our study will go some way to beginning to resolve these contradictory effects.
We thus investigated these two critical issues of semantic exposure and consolidation in a single design. We asked, first, whether the provision of systematic semantic information about a novel word would enhance learning of the low-level phonological form of that novel word; second, we wanted to know whether a period of offline consolidation would impact upon the emergence of phonological representations or indeed modulate any semantic influence on this acquisition process. CLS accounts predict that consolidation can both strengthen access to new word representations and promote their abstraction from episodic knowledge (Davis & Gaskell, 2009; McClelland et al., 1995). Enhanced access to new word representations has been observed in faster responses to the phonological features of new words (Snoeren, Gaskell, & Di Betta, 2009) and gradually improving recognition and recall of new phonological forms over consolidation (Tamminen et al., 2010; Davis, Di Betta, Macdonald, & Gaskell, 2009; Dumay & Gaskell, 2007; Dumay et al., 2004). Abstraction from episodic knowledge after consolidation has been observed for higher-level aspects of word learning such as semantic integration (Tamminen & Gaskell, 2013), morphological and grammatical rule learning (Tamminen et al., 2012; St. Clair & Monaghan, 2008), and for nonlinguistic statistical learning (Durrant, Taylor, Carney, & Lewis, 2011; Ellenbogen, Hu, Payne, Titone, & Walker, 2007). However, it is not established whether abstraction may also operate on earlier phonological form learning processes. Hence, we looked for a change both in access to new phonological form representations and in their dependence on episodic knowledge.
Our first question regarding a semantic benefit on phonological form learning raised the critical issue of how to manipulate the provision of semantic information. In a key study in which the provision of semantic information was actually disadvantageous to lexical integration (Takashima et al., 2014), participants were required to learn novel words via phoneme monitoring, where some were also presented with a visual referent. One possibility is that the semantic disadvantage in this study arose because learning two novel pieces of information (a new phonological form and a new meaning) is more cognitively demanding than learning just one novel piece of information. It is possible that, when the amount of information and learning goals are equated, the acquisition of phonological representations will be benefited by systematic semantic associations during training, because of associative links between forms and referents leading to a stronger memory trace (e.g., Leach & Samuel, 2007). Thus, we employed a new paradigm in which participants learned novel spoken words that were always accompanied by a picture. However, in the correlated condition, there was a strong relationship between the novel words and their visual referent across trials, whereas in the uncorrelated condition, there was no relationship between the novel words and their visual referents across trials. Task goals were thus perfectly equated across conditions, and critically, participants were unaware of two categorically different learning conditions.
Following learning, we tested the precision of newly acquired phonological form representations using the MMN potential as an electrophysiological measure of auditory discrimination, both immediately (Day 1) and 24 hr after participants acquired a novel vocabulary (Day 2). The MMN has been shown to be a sensitive index of novel word learning and discrimination from known words and is critically elicited in the absence of attention to the speech stream (and thus is not contaminated by specific processing goals). Shtyrov et al. (2010) used the evoked MMN as an index of novel word learning: A pseudoword was presented infrequently against a stream of known words, where the infrequent pseudoword differed by one phoneme from the known word (i.e., pipe–pite). By the end of a 14-min exposure session, the pseudoword elicited an MMN response, which Shtyrov et al. (2010) suggested was the result of rapidly forming a neural memory trace of the novel pseudowords. The MMN was elicited in response to a precise recognition point in the speech signal, where the novel pseudoword could be discriminated from the known word standard, and it therefore measured the perceived phonological contrast between the novel pseudoword and known word. Using the MMN in similar design in our test phase therefore allowed us to address whether systematic semantic information enhances the acquisition of phonological representations and to test whether the acquisition of these phonological representations (or any semantic influence therein) was modulated by overnight consolidation.
Twenty-four right-handed native English speakers (mean age = 21.5 years, SD = 2.59 years, range = 18–27 years; 15 women) completed the study. The participants had no known auditory, language, or learning difficulties. All participants were recruited from Royal Holloway and were paid for their participation. The study received ethical approval from the Psychology Department ethics committee at Royal Holloway.
Materials and Design
There were three conditions in the learning task: the correlated condition, where there was a strong association between the novel words and picture referents; the uncorrelated condition, where there was no association between the novel words and picture referents; and the known word condition, which contained existing words and their corresponding referents. Participants were exposed to six monosyllabic spoken pseudowords in each learning condition (therefore 12 novel pseudowords in total) and six known words. The novel pseudowords were assigned to each learning condition as shown in Table 1. The pseudowords consisted of six minimal pairs drawn from a larger pool of items for each subject. Two of the pseudoword minimal pairs made up minimal triplets, which consisted of two pseudowords and one known word; these triplets were included to be later used in the MMN test. Each item consisted of a consonant–vowel token taken from the naturally spoken known word recording (e.g., /kɑı/, as in kite) cross-spliced onto a /t/, /p/ or /k/ voiceless stop consonant. These were taken from the onset of the final voiceless stop consonant in /kɑıt/, /pɑıp/, and /bɑık/, respectively. This cross-splicing meant that each minimal set was identical until the final stop consonant (e.g., /kɑıt/ or /kɑıp/), with no acoustic or coarticulatory differences before this disambiguation point (in the subsequent MMN sessions, these points would be the trigger to which we locked our ERP waveforms). Each item could thus only be uniquely recognized at the final phoneme. All spoken stimuli were recorded and edited in Cool Edit 2000, and peak sound energy was equated across items. Inclusion of known words in the learning task equated prior exposure to both the pseudowords and known words that would later be presented in the oddball task. All pseudowords were counterbalanced between the correlated and uncorrelated learning conditions.
|Correlated Words .||Known Words .||Uncorrelated Words .|
|boap /boʊp/||boat /boʊt/||boak /boʊk/|
|kipe /kɑıp/||kite /kɑıt/||kike /kɑık/|
|jep /dʒɛp/||jet /dʒɛt/||clet /klɛt/|
|vate /veıt/||stick /stık/||stit /stıt/|
|pite /pɑıt/||pipe /pɑıp/||vape /veıp/|
|clep /klɛp/||bike /bɑık/||bipe /bɑıp/|
|Correlated Words .||Known Words .||Uncorrelated Words .|
|boap /boʊp/||boat /boʊt/||boak /boʊk/|
|kipe /kɑıp/||kite /kɑıt/||kike /kɑık/|
|jep /dʒɛp/||jet /dʒɛt/||clet /klɛt/|
|vate /veıt/||stick /stık/||stit /stıt/|
|pite /pɑıt/||pipe /pɑıp/||vape /veıp/|
|clep /klɛp/||bike /bɑık/||bipe /bɑıp/|
The IPA transcription is shown beside each word. The middle column shows the known words (in bold). The column to the left shows the minimal pairs with these known words (in bold) that would be used in the correlated learning condition. The column to the right shows a similar list that would be used for the uncorrelated condition. In the subsequent MMN sessions, we would only use the minimal triplets (the top two lines). However, to make the learning task sufficiently challenging, we used extra known words that had a minimal pair (which we could allocate to either learning condition) and novel words that were minimal pairs with each other (these are shown in italics).
On each learning trial, the auditory presentation of a word was followed by two pictures. In the correlated word condition, one of these pictures was frequently a referent object, and the other picture was a nonreferent foil object. In the uncorrelated word condition, both pictures were always nonreferent foil objects. In the known word condition, one picture was frequently the known word referent (e.g., a picture of a kite), and the other picture was a nonreferent foil object (Figure 1). The visual stimuli consisted of six known objects, which were prototypical referents of the known words, and 30 novel objects, which were obscure real objects. For each participant, six novel objects were randomly selected as referents for the six correlated words; the remaining 24 were nonassociated foil objects, which were shown with a different word on each trial. After participants had been exposed to all 18 words and 36 pictures, the foil pictures were reassigned to different words on the following round of trials. A different foil picture was thus presented with each word on each round of trials. One foil was presented beside the referent category picture for correlated and known words, and two foils were presented with the uncorrelated words. There were 40 exposures to each word over the course of the learning task.
The six correlated pseudowords were thus frequently associated with the same novel object; the six uncorrelated pseudowords were presented without a consistent picture association. The participants' task was to respond as to whether one of the two pictures was the referent for that word or whether the referent was not present. In the known and correlated word conditions, the referent could either be present (2/3 trials) or absent (1/3 trials). On referent-absent trials, a different referent object from that condition was presented on every trial. This protocol ensured that accuracy for the correlated words emerged from learning a one-to-one mapping between a correlated pseudoword and referent, rather than simply a category of “referent” objects. After responding, participants received feedback on whether they had selected the correct or incorrect referent. To maintain response motivation and attentiveness in the uncorrelated condition, positive feedback was randomly given at chance levels on each exposure. Because chance levels were considered 1/3 for this purpose (based on participants being able to respond “left object,” “right object,” or “neither object” on each trial), this meant that 1/3 of uncorrelated word responses were followed by positive feedback. This positive feedback was randomly interspersed with the 2/3 of negative feedback trials over the course of the experiment.
Test of Phonological Form Learning
The MMN is an ERP measure most commonly evoked in passive oddball paradigms to a rare “deviant” stimulus within a stream of “standard” filler stimuli (Näätänen et al., 1997). The MMN is suggested to measure a memory trace evoked by the deviant (Pulvermüller & Shtyrov, 2006) or prediction error from the standard auditory stream (Winkler, 2007) and is highly sensitive to a range of lexical variables (e.g., Shtyrov, Kimppa, Pulvermüller, & Kujala, 2011). We followed the design of Shtyrov et al. (2010) by presenting novel word deviants against a background of known word standards. Critically, in this design, the deviant stimulus must be detected as phonologically distinct from the standard to elicit an MMN (e.g., Shtyrov et al., 2010). It hence provides a pure measure of relative discrimination of newly acquired spoken words at a neural level, eliminating confounds of task goals and explicit memory processes commonly evoked by behavioral testing.
To present both a newly learned correlated and uncorrelated word against a competitor environment of known words in the oddball task, we employed a multifeature oddball paradigm (Näätänen, Pakarinen, Rinne, & Takegata, 2004), in which we interspersed two repetitions of a known filler (i.e., boat) with a newly learned pseudoword deviant. The task started with 15 presentations of the known token (e.g., boat) to habituate participants to the filler stimulus (Fisher, Grant, Smith, & Knott, 2011). There were then 900 trials in total, constituting 300 pseudoword exposures (150 correlated, i.e., boap; 150 uncorrelated, i.e., boak) and 600 known filler exposures (i.e., boat), with an 800 msec SOA (Shtyrov et al., 2010). The pseudowords and fillers thus had a 1/3 and 2/3 presentation probability, respectively (Figure 2). A different minimal triplet was used in the oddball task on each day. Counterbalancing of the critical pseudowords between correlated and uncorrelated conditions and day of testing (Day 1/Day 2) meant that the same sounds were present in each novel pseudoword category, ensuring an evoked neural response therefore emerged from the learned psycholinguistic properties of that word rather than salient acoustic properties.
Recognition Memory Task
On Day 2, participants completed a recognition memory test following the oddball task. The foil words diverged from the novel pseudowords at the final consonant, where items were voicing-contrast minimal pairs in all but one case. Examples of the recognition foils are given in Table 2.
|Correlated Words .||Recognition Foils .||Uncorrelated Words .||Recognition Foils .|
|Correlated Words .||Recognition Foils .||Uncorrelated Words .||Recognition Foils .|
Association Recall Task
After the oddball session and recognition memory task on Day 2, participants were tested on their memory of the word–picture associations learned on Day 1. Participants responded using a sheet of paper with an array of 30 pictures from the learning task, six of which were referents for the correlated words and 24 of which were foil pictures.
The learning task was run on Day 1, with stimuli delivered via headphones using E-Prime (Psychology Software Tools, Sharpsburg, PA). There were 40 exposures to all stimuli and 720 trials in total. Referent-present and referent-absent trials were randomized, and the order of items was randomized within each round of exposures. On each trial, participants first heard the spoken word followed by the presentation of two pictures. Instructions explained that the task required learning which words went with which objects. Participants responded using arrow keys based on whether the left, right, or neither picture was the referent object.
The oddball task was run after the learning task on Day 1, and on Day 2, when participants returned to the laboratory after a 24-hr delay. Stimuli were presented through headphones while participants watched a silent video to detract attention from the auditory stream. A questionnaire about detailed events in the video at the end of each day yielded a mean accuracy of 81.22% (SD = 7.61) on Day 1 and 81.94% (SD = 7.93) on Day 2, verifying participants had been sufficiently engaged in the video.
In the recognition memory task on Day 2, participants heard each pseudoword and foil presented in isolation and responded via keyboard to indicate whether that item was familiar or unfamiliar.
Finally, in the association recall task on Day 2, participants were presented with all 12 trained pseudowords via the headphones and were instructed to write each word under its corresponding picture if they were confident the word went with that picture. The task was self-paced, and participants made a key-press to advance to the next word. At the presentation of each word, a number also appeared on the screen, which participants were instructed to write beside the word on their response sheet. This ensured accuracy in coding responses in case of difficulties reading the handwritten responses, as the pseudoword forms were highly similar.
EEG Preprocessing and ERP Formation
The EEG data were acquired using a 64-channel Biosemi (Amsterdam, the Netherlands) ActiveTwo system, using a 10–20 setup. Two additional electrodes were placed on the outer canthi of each eye, and two electrodes were placed above and below the right eye to record saccadic and blink oculomotor artifacts, respectively. Two electrodes were placed on the right and left mastoid to re-reference the data offline, and the EEG was recorded using a 2000-Hz sampling rate.
EEG data were downsampled to 250 Hz and filtered with a 1-Hz high-pass filter. An independent components analysis, which used tools from both Fieldtrip and EEGlab, removed oculomotor artifacts (Shimi & Astle, 2013; Oostenveld, Fries, Maris, & Schoffelen, 2011; Delorme & Makeig, 2004; Ungureanu, Bigan, Strungaru, & Lazarescu, 2004). To account for the different disambiguation points between the minimal pair triplets used on each day, the data were epoched such that the disambiguation point for each individual item occurred at exactly 0 msec in peristimulus time. Analyses were thus locked to the relative disambiguation point for each item, permitting a precise analysis of any MMN memory trace activation as a function of psycholinguistic properties of the newly learned words. EEG data were then epoched −600 to 200 msec (with “0” the relative disambiguation point across items) and processed with a 30-Hz low-pass filter. Epoched data were rebaselined to −50 to 0 msec before the relative disambiguation point to account for the shifting of the epoch point between items. Because of the varying disambiguation point across items, different relative word intensities preceded the disambiguation point; baselining the data immediately before disambiguation thus ensured these acoustic differences did not contribute to the MMN (e.g., Shtyrov et al., 2010). The removal of excessively noisy trials was then implemented using the Fieldtrip Visual Artifact Rejection tool (Oostenveld et al., 2011); this measured the overall variance in voltage within each trial, and trials with exceptionally high variance were removed. This process removed 3.52% of trials overall. Following this, the remaining trials were averaged to form an ERP for each condition over the oddball task on each day.
Midline electrodes Cz, CPz, Pz, POz, and Oz, showing the most negative raw voltage in the grand-averaged topographies, were pooled together for spatial smoothing and to increase the signal-to-noise ratio because of the relatively low number of trials per condition (Shtyrov et al., 2010). Mean amplitudes in a 50-msec time window from the first negative peak in the grand-averaged waveform across both days and pseudoword conditions (∼130–180 msec) were analyzed. To isolate the MMN from other components, a difference wave was computed by subtracting each participant's known standard voltage from their correlated and uncorrelated pseudoword voltage on each day (cf. Bishop & Hardiman, 2010). This difference wave measured the degree of critical pseudoword discrimination from the competitor environment of known words.
To examine any quantitative consolidation-based changes in discrimination, we analyzed the MMN difference wave elicited by correlated and uncorrelated deviants over the oddball task on each day. We reasoned that if there was a facilitatory effect of consolidation on phonological form learning there should be a greater evoked MMN magnitude on Day 2 for one (or both) of the newly learned pseudoword types. Although an online increase in a pseudoword MMN within a single session has been found previously in a comparable oddball task (Shtyrov et al., 2010), we reasoned that a consolidation-driven change in discrimination should yield an overall quantitative change in MMN magnitude from Day 1 to Day 2.
Performance on the learning task indicated good knowledge of the correlated word associations by the end of the exposure session, with group-level accuracy averaged over the final ten exposures of the learning task at 74.38% (SD = 19.69), which was significantly above chance levels [t(23) = −16.35, p < .001]. Figure 3 shows the learning curve for the correlated words over the course of the experiment.
In the recognition memory test, conducted on Day 2, six participants did not respond for more than 50% of trials in one condition, meaning accuracy scores could not be computed for those indivdiuals. Recognition memory accuracy scores for the remaining participants showed above-chance recognition of both correlated and uncorrelated pseudowords [correlated: t(17) = 19.56, p < .001; uncorrelated: t(17) = 6.42, p < .001]. A Condition (2: correlated vs. uncorrelated) × Item Type (2: Learned item vs. Foil) ANOVA on percentage accuracy further yielded a significant main effect of Condition only [F(1, 17) = 23.31, p < .001, ηp2 = .58], with correlated words exhibiting significantly higher recognition accuracy than uncorrelated words [t(17) = 4.83, p < .001; correlated M = 91.67%, SD = 9.04; uncorrelated M = 71.3%, SD = 14.07]. There was no effect of Condition on recognition memory RTs.
The association recall test, conducted on Day 2 after the oddball and recognition memory test, was scored by the percentage of correlated words correctly assigned to their referent picture (out of the array of 30 pictures). Percentage accuracy showed that participants retained good knowledge of the word and picture associations on Day 2 (M = 64.58%, SD = 22.15). Errors were predominantly from “no object” responses (not assigning correlated words to an object; 20.83%). Assigning a correlated word to an incorrect picture constituted 9.03% of errors, and labeling a picture with the uncorrelated minimal pair of its correlated label (e.g., labeling the boap correlated object as a boak) constituted 4.86% of errors.
Effects of Consolidation and Meaning on the MMN
To examine any consolidation-based changes in discrimination, the correlated and uncorrelated MMN difference wave on each Day was submitted to a Condition (Correlated vs. Uncorrelated) × Day (Day 1 vs. Day 2) repeated-measures ANOVA. This comparison yielded a significant main effect of Condition only [F(1, 23) = 14.02, p = .001, ηp2 = .38], with a significantly more negative correlated word MMN (M = −0.34, SD = 0.63) than uncorrelated word MMN (M = 0.06, SD = 0.55) (Figure 4). There was no main effect of Day and no interaction between Condition and Day (both Fs < 1 and ps > .4).
Relationship between the MMN and Semantic Association Learning
The ERP analysis clearly showed enhanced phonological form discrimination for correlated words relative to uncorrelated words and that this enhancement was present on both days, suggesting that consolidation did not strengthen access to the new phonological representations. However, CLS accounts also predict that a second function of consolidation may be the transformation of episodic representations to abstract lexical representations (e.g., Davis & Gaskell, 2009). From this prediction, there are (at least) two possible sources of knowledge about newly learned phonological forms. One is episodic knowledge from recent learning, whereas the other is via a lexical store independent of episodic knowledge. Given this, we sought to distinguish the contribution of these two sources of knowledge to our MMN effects, to ascertain whether different types of knowledge drove the MMN on each day. We thus investigated the extent to which the explicit learning of semantic associations on Day 1 underpinned the MMN for correlated words on Day 1 and Day 2. One participant was excluded from this analysis, because of having a learning score >2.5 standard deviations below the mean accuracy score. We ran a partial correlation, controlling for word list, between each participant's accuracy score on the correlated word associations at the end of the learning task (averaged over the final 10 exposures) and the correlated word MMNs on Day 1. This analysis revealed a significant negative correlation between semantic learning accuracy and the correlated word MMNs [r(20) = −.59, p < .005]. This analysis indicated that, as semantic learning accuracy improved, correlated word discrimination improved, which was indexed by a more negative MMN voltage. We then tested whether this benefit of episodic knowledge extended to enhanced discrimination of the correlated words on Day 2 and found no significant relationship [r(20) = .18, p = .44]. Meng's Z test (Meng, Rubin, & Rosenthal, 1992) confirmed that the correlations differed significantly between Day 1 and Day 2 (Z = 2.56, p = .01). Figure 5 presents scatterplots of these correlations.1 Our correlational analyses thus suggested that, in the correlated word condition, phonological discrimination (indexed by the MMN) was initially tied to semantic learning accuracy, but following a period of offline consolidation, there was no relationship between semantic learning and phonological discrimination.
We sought to establish whether the provision of systematic semantic information facilitates the learning of phonological representations and how overnight consolidation impacts on these representations. Participants learned spoken novel words accompanied by a novel visual referent, which was either systematically associated with the novel word (correlated condition) or differed on every trial (uncorrelated condition). We subsequently tested newly acquired phonological representations using the MMN potential as an index of auditory discrimination. Results showed a main effect of semantic condition only, with those words in the correlated condition yielding enhanced discrimination from known words. There was no strengthened access to phonological representations by overnight consolidation. However, although discrimination performance did not change as a function of consolidation, correlational analyses suggested that it was underpinned by different sources of knowledge across the two days of testing. Explicit knowledge of the semantic associations in the learning task was reflected in discrimination of the correlated words on Day 1 (r = −0.59) but not on Day 2 (r = .18). The critical finding of this study is thus that semantic information can enhance the acquisition of new phonological representations, with the possibility that a period of offline consolidation may assist in the abstraction of these representations.
As shown by the correlated word MMNs on both Day 1 and Day 2, semantic knowledge facilitated the learning of phonological representations. Although previous conclusions regarding the role of semantic exposure on aspects of word learning have been mixed (cf. Leach & Samuel, 2007; Breitenstein et al., 2005; Dumay et al., 2004), the current study demonstrates that the provision of systematic semantic information confers a selective benefit for acquiring new phonological form representations. This result poses important constraints on models of word learning and memory, which must account for a semantic influence on phonology not only during known word recognition (e.g., Tyler, Voice, & Moss, 2000) but also during the relatively early stages of learning these words in the first place. Distributed connectionist models can perhaps best account for this interactive influence between language subsystems (e.g., Davis & Gaskell, 2009; Gaskell & Marslen-Wilson, 1997; McClelland et al., 1995). In particular, a key behavior of such models is that novel words characterized by systematic mappings between word forms and meanings are learned with greater ease than novel words lacking this systematicity (Rueckl & Dror, 1994). The current study extends these findings by suggesting that novel words with a degree of systematicity (i.e., a semantic association) are not only learned more readily than those without, but that this systematicity directly impacts upon phonological form learning itself, rather than simply word level recall of new items.
However, such models may benefit from considering the impact of learning goals on the outcome of the acquisition process. The learning task in the current study emphasized associative learning, but it is equally plausible that an emphasis on phonological learning would minimize the recruitment of semantic information during training and consequently not afford such a semantic benefit (see Takashima et al., 2014; also cf. Yoncheva, Blau, Maurer, & McCandliss, 2010; Forster, 1985). The impact of learning goals on initial acquisition also has implications for the time course of consolidation; for example, Szmalec et al. (2012) suggested that the implicit learning of new word forms via a repetition task led to more efficient lexical consolidation than the explicit learning of word forms (as in phoneme monitoring paradigms; e.g., Dumay & Gaskell, 2007). Learning goals are thus a central factor in evaluating the extent to which semantic information is recruited during training and its subsequent impact on consolidation.
The conclusion that semantic systematicity impacts on phonological form learning in particular is supported by the temporal precision of the MMN we used to measure learning. Similar to Shtyrov et al. (2010), we observed an evoked MMN for the discrimination of the novel (correlated) words from existing words following a defined recognition point in the speech signal, where the MMN indexed the perceived phonological contrast between the novel word and known word. Furthermore, as the MMN was elicited automatically in the absence of attention to the speech stream and thus without any specific processing goals, it provided a precise measure of the degree of phonological discrimination of the newly learned words. Notably, we employed a multifeature MMN paradigm that required the fine-grained discrimination of two minimal novel words from a known word, which could be substantially more taxing than the learning and discrimination of a single minimal novel and known word pair as in Shtyrov et al. (2010). The increase in phonological learning demands in the current study could have thus contributed to observing no stable MMN response for the uncorrelated words on Day 2. It is also notable that the MMN is sensitive to the familiarity of linguistic stimuli, where it is evoked for familiar words rather than simply in response to a phonemic contrast. The MMN can distinguish native phonemic contrasts, where discrimination between native phoneme categories (e.g., /bɑ/-/dɑ/) elicits an MMN without any training (e.g., Shestakova et al., 2002; Phillips et al., 2000; Dehaene-Lambertz, 1997; Näätänen et al., 1997). However, when native phonemic contrasts are presented within novel words the MMN is significantly reduced (Pulvermüller et al., 2001). Shtyrov and Pulvermüller (2002) tested whether this reduction of MMN for phonemic contrasts in novel words was because of the unfamiliarity of novel word stimuli by comparing the MMN responses for (i) word deviants against word standards, (ii) word deviants against pseudoword standards, and (iii) pseudoword deviants against word standards. The MMN elicited by word deviants (conditions i and ii) was significantly greater than for the pseudoword deviants (condition iii). This suggested a critical factor in MMN magnitude to linguistic stimuli was familiarity of the deviant stimulus rather than simply a phonemic or lexicality difference between the deviant and standard stimuli, in which case the pseudoword deviant versus word standard should have elicited a comparable MMN (see also Korpilahti, Krause, Holopainen, & Lang, 2001, for similar results). Interestingly, this suggests that the MMN evoked by word stimuli may be based at least partly on a top–down influence of word representations benefiting discrimination. It is possible that the uncorrelated words did not establish strong enough representations to influence discrimination on Day 1 or Day 2 in the current study and potentially had a slower time course of establishing new phonological representations.
One important question following this is whether the inconsistent pictures in the uncorrelated word condition unfairly disadvantaged the learning of the uncorrelated phonological forms, thus potentially exaggerating the systematic semantic benefit observed. Although we cannot rule out this possibility, it is important to recognize that the uncorrelated words had significantly above-chance behavioral recognition accuracy on Day 2, indicating that participants had a degree of familiarity with the uncorrelated words, albeit less than the correlated words. Furthermore, varying the associative systematicity between the correlated and uncorrelated condition arguably provided a more realistic proxy of real-world learning than contrasting the correlated word condition with a “form-only” condition (e.g., as in Dumay et al., 2004; Takashima et al., 2014). In real-world situations, we are rarely exposed to a spoken word with no potential semantic meaning or goal to acquire one, and it is not uncommon to experience a word in different contexts across several exposures and thus struggle to extract a specific meaning, such as in the case of words with multiple meanings (e.g., bug). Finally, in experimental situations contrasting semantic and “form-only” conditions (e.g., Takashima et al., 2014; Dumay et al., 2004), there is not only a difference in semantic content between the two conditions but also a categorical difference in learning goals, information load, and attentional demands. We therefore suggest that the current learning paradigm provides a contrast between associative semantic learning and an ambiguous learning situation where words could be treated as having either many potential referents or no referent, which is not unlike real-world word learning situations.
Recent research on consolidation effects in novel word learning has drawn heavily on CLS theories of memory (e.g., Davis & Gaskell, 2009; McClelland et al., 1995). The central tenet of these theories is that newly learned words are stored initially as episodic representations mediated by a fast-learning hippocampal store and over a period of consolidation become less dependent on this episodic memory as they become integrated with existing knowledge and therefore represented neocortically. If this instantiation is correct, we should expect to see a greater contribution of episodic knowledge to phonological form representations immediately after learning, with decay in this episodic contribution over time as newly learned words become increasingly lexicalized (see Tamminen & Gaskell, 2013, for a similar argument). Our correlational analysis suggested this was the case: Episodic knowledge of the semantic associations was tied to correlated word discrimination on Day 1 but was unrelated to it on Day 2. Importantly, the lack of association between semantic learning and the correlated word MMN on Day 2 is unlikely to be the result of participants simply forgetting the associations: The association recall accuracy data collected on Day 2 showed that participants retained strong knowledge of the correlated word–picture associations after consolidation, with 64.58% accuracy when selecting the correct referent from an array of the novel pictures. It is important to note that this is a substantially more difficult task than selecting from the two pictures presented in the learning task. Taken together, these data suggest that consolidation decreased the reliance of the correlated word phonological forms on learned associations from the training task and that this decreased reliance may have been a specific consequence of consolidation, rather than failure to retain memory of the associations overnight. These data are consistent with a CLS account (Davis & Gaskell, 2009; McClelland et al., 1995) and extant literature suggesting that consolidated knowledge can be represented independently of episodic knowledge (e.g., Tamminen & Gaskell, 2013; Tamminen et al., 2012; see also Gomez et al., 2006). Nonetheless, we recognize that because the consolidation-based abstraction of the correlated words was based on a correlational change, rather than more direct evidence of independence from memory of semantic associations, this finding requires support in future research.
Alternative accounts of lexical learning assert that words learned in adulthood can only be represented episodically (e.g., Qiao, Forster, & Witzel, 2009; Jiang & Forster, 2001). Our data add to evidence that is inconsistent with this claim (see also Dumay & Gaskell, 2012). If the newly learned words could only achieve an episodic representation, we ought to have observed a postconsolidation relationship between the correlated word MMN and semantic learning. That this was not the case suggests that new phonological forms may be represented independently of episodic knowledge and that this independent representation could require offline consolidation. However, it is important to recognize that we did not measure the engagement of the newly learned words with existing lexical items (Leach & Samuel, 2007). It thus remains to be established what consequences this effect on phonological representations has for the full lexical integration of newly learned words and their engagement with existing knowledge.
The current findings thus pose several questions for specifying the impact of semantics on the word learning process. Given the adverse effect of semantic exposure on the time course of lexical integration (Takashima et al., 2014; Dumay et al., 2004; cf. Henderson et al., 2013), one possibility is that the semantic benefit on learning new phonological form representations observed here does not transfer to their offline integration with existing lexical items. This would suggest that phonological form learning and lexical integration reflect two separate stages of word memory formation, which are differentially impacted by semantic information. Alternatively, it could be the case that a learning task with semantic information must also recruit phonological information sufficiently well for the time course of lexical integration to be unimpaired by semantic knowledge. We note, however, that acquiring new semantic knowledge in terms of visual referents may differ from learning more semantically rich meanings, which link to existing semantic knowledge, and we have thus measured just one aspect of a semantic influence on word learning. It is also the case that our novel items had a large phonological neighborhood size in contrast to the studies of Takashima et al. (2014) and Dumay et al. (2004), which utilized items with few close phonological neighbors (e.g., cathedruke–cathedral). Thus, it is also possible that semantic knowledge is beneficial only in the acquisition of new words with high phonological neighborhoods, akin to the impact of imageability (a semantic variable) in skilled spoken word recognition for words in high competition cohorts only (e.g., Tyler et al., 2000). The way in which a semantic advantage for learning new phonological form representations relates to the offline impact of semantic knowledge in lexical integration therefore remains an important avenue for future work.
We have demonstrated that systematic semantic knowledge facilitates the acquisition of new phonological representations and that consolidation may provide an opportunity for these phonological representations to become less dependent on the episodic knowledge they are linked to before consolidation. Given the mixed evidence for the precise role of semantic information in word learning, the current study provides an important advance by suggesting that this knowledge is advantageous for the relatively low-level learning of phonological form representations and that these representations may be abstracted from episodic knowledge as a consequence of offline overnight consolidation. We thus provide new evidence elucidating the nature and time course of a semantic influence on the development of new phonological representations.
D. E. A. was supported by a British Academy Postdoctoral Fellowship and by the Medical Research Council (United Kingdom) intramural program (MC-A060-5PQ40). K. R. was supported by a research grant from the Economic and Social Reseach Council (United Kingdom; ES/L002264/1).
Reprint requests should be sent to Erin Hawkins, Department of Psychology, Royal Holloway University of London, Egham, Surrey, TW20 0EX, United Kingdom, or via e-mail: firstname.lastname@example.org.
A reviewer wondered whether this correlation was driven by a data point in the bottom left corner, which showed a learning score of 45%. When this data point was removed, the pattern of data was unchanged [Day 1: r(19) = −.57, p = .007; Day 2: r(19) = .41, p = .07], and the correlations still differed significantly between Day 1 and Day 2 (Meng's Z = 3.07, p = .002).