Abstract

When listening to modified speech, either naturally or artificially altered, the human perceptual system rapidly adapts to it. There is some debate about the nature of the mechanisms underlying this adaptation. Although some authors propose that listeners modify their prelexical representations, others assume changes at the lexical level. Recently, Larsson, Vera, Sebastian-Galles, and Deco [Lexical plasticity in early bilinguals does not alter phoneme categories: I. Neurodynamical modelling. Journal of Cognitive Neuroscience, 20, 76–94, 2008] proposed a biologically plausible computational model to account for some existing data, one which successfully modeled how long-term exposure to a dialect triggers the creation of new lexical entries. One specific prediction of the model was that prelexical (phoneme) representations should not be affected by dialectal exposure (as long as the listener is exposed to both standard and dialectal pronunciations). Here we present a series of experiments testing the predictions of the model. Native listeners of Catalan, with extended exposure to Spanish-accented Catalan, were tested on different auditory lexical decision tasks and phoneme discrimination tasks. Behavioral and electrophysiological recordings were obtained. The results supported the predictions of our model. On the one hand, both error rates and N400 measurements indicated the existence of alternative lexical entries for dialectal varieties. On the other hand, no evidence of alterations at the phoneme level, either in the behavioral discrimination task or in the electrophysiological measurement (MMN), could be detected. The results of the present study are compared with those obtained in short-term laboratory exposures in an attempt to provide an integrative account.

INTRODUCTION

One important property of our speech perception system is its flexibility. Indeed, we can rapidly adapt it to natural variation, such as accented speech (Clarke & Garrett, 2004), or to artificial manipulations, such as vocoded speech (Davis, Johnsrude, Hervais-Adelman, Taylor, & McGettigan, 2005), sine-wave speech, (Remez, Rubin, Berns, Pardo, & Lang, 1994), or time-compressed speech (Sebastian-Galles, Dupoux, Costa, & Mehler, 2000; Pallier, Sebastian-Galles, Dupoux, Christophe, & Mehler, 1998). Actually, native speech representations are far from immutable. For instance, after moving to a new region or state where a different dialect is spoken, listeners rapidly adapt to the new speech properties and, if after staying there for some time they return to their original location, they are often said to speak with an accent (Munro, Derwing, & Flege, 1999). This raises the question of what the underlying mechanisms responsible for these adaptations are. The answer to this question revolves around whether prelexical representations are adjusted to accommodate the modified input, whether lexical entries are changed, or whether both levels are altered. The present study aims to explore the consequences of extended exposure to a dialect at different levels of the speech perception system.

These incidental observations of high plasticity of the native speech perception system to foreign accents have found support in different experimental studies. Clarke and Garrett (2004) exposed native English listeners to a very brief exposure to English sentences produced by Spanish and Chinese native speakers, who produced accented English. After just two to four habituation sentences, participants were able to recognize target words presented at the end of sentences in accented English as efficiently as in a control (unaccented) condition. Maye, Aslin, and Tanenhaus (2008) reported convergent results using a lexical decision task. In this research, participants' performance was measured after listening to both neutral-accent English and an artificial novel accent. The novel accent consisted in shifting some front vowels, so that, for instance, the word “witch” was pronounced as “wetch.” The results showed that after a short exposure (20 min), participants increased their word judgments for shifted words (like “wetch”). Importantly, they also generalized this response shift to novel lexical items they had not been previously exposed to. The authors considered that this generalization shows that the adaptation to foreign-accented speech involves nonlexical processes (but see Floccia, Goslin, Girard, & Konopczynski, 2006, for partially divergent results). All these studies assume that the shift from initial comprehension problems with regional or foreign accented speech to efficient (normal) processing reflects the operation of a swift prelexical mechanism capable of matching the incoming forms onto those used in the lexicon.

A recent series of studies using a perceptual learning paradigm has unveiled some properties of this rapid adaptation mechanism. In a seminal study, Norris, McQueen, and Cutler (2003) exposed a group of participants to stimuli that were ambiguous in an /f/–/s/ continuum. After hearing just 20 words containing ambiguous stimuli, embedded within 200 filler words, the perceptual system was recalibrated and the boundary between /f/ and /s/ shifted. These results have been extended to the case of plosives (Kraljic & Samuel, 2006) and other fricatives (Eisner & McQueen, 2005; Kraljic & Samuel, 2005). Finally, perceptual adaptation has been shown to be relatively long-lasting and robust despite speaker variability (Eisner & McQueen, 2006; Kraljic & Samuel, 2005). Although the different authors do not fully agree as regards the level at which adaptation takes place, namely, featural (Kraljic & Samuel, 2006) or phoneme (Eisner & McQueen, 2005), they do agree that adaptation modifies prelexical processing. However, they either remain silent about the potential modifications of the lexical entries or assume that the adaptation mechanism would filter variability, thus making lexical representation resistant to variation in the speech signal. In fact, most of these explanations assume that lexical representations have an abstract nature, without specific information about the details of the different realizations (McQueen, Cutler, & Norris, 2006). In summary, these studies argue: (1) that repeated exposure to a particular dialect does not imply the creation of new (or alternative) lexical entries, but simply the creation of prelexical routines that would remap the incoming information into the lexical entries; and (2) that over time the lexical feedback which aids perceptual learning could lead listeners to retune their prelexical categories. However, all the above-reviewed studies concerned short exposures to dialectal or modified speech. To our knowledge, there are no studies that have explored in a systematic way the long-term consequences, both at the prelexical and lexical levels, of extended exposure to accented speech.

Although there is no evidence of experimental work addressing the joint prelexical and lexical dynamics of long-term exposure to variability, several studies have sought to determine how variation is encoded at the lexical level, taking advantage of the existence of allophonic variation. Considered together, these studies have yielded mixed results. Connine (2004) asked native American English listeners to perform a phoneme categorization task in word/nonword continua. The segment to be detected was either /b/ or /p/ and it always appeared in the initial position. The critical manipulation was that the targets were embedded in either a flap (e.g., preDy–breDy) or a /t/ variant (e.g., preTTy–breTTy) carrier stimulus. The results showed higher lexical biased responses (in the example, pretty) for the more frequent (flap) variant, thereby supporting the idea that English listeners stored a dialectal variant of English (flap) together with the canonical form. Converging results were obtained by McLennan, Luce, and Charles-Luce (2003) using a different methodological approach but the same allophonic variation. These authors compared the repetition effects of different forms of the same words, and observed equivalent repetition effects for the different allophonic variations: Flapped words primed carefully articulated words and vice versa. However, Sumner and Samuel (2005) failed to replicate this result. These authors compared semantic priming and repetition effects for canonical and allophonic variations. Although they found equivalent semantic priming for different allophonic variations (indicating that variations mapped onto the same lexical entries), they observed no systematic repetition effects when noncanonical forms were used. Thus, they concluded that only the canonical form was represented in the long-term lexicon.

Sebastian-Galles, Echeverria, and Bosch (2005) claimed that extended exposure to a dialectal variation induced the creation of a new lexical representation. These authors asked native Catalan participants to perform an auditory lexical decision task. In their materials, nonwords were made by exchanging a single vowel from existing words, the exchange involving the /e/–/ɛ/ vowel pair. The interest of this exchange lies on the fact that this vowel contrast is not found in Spanish and that Spanish natives have great difficulties in perceiving it (Pallier, Colome, & Sebastian-Galles, 2001; Bosch, Costa, & Sebastian-Galles, 2000; Sebastian-Galles & Soto-Faraco, 1999; Pallier, Bosch, & Sebastian-Galles, 1997). Participants tested in this study lived in a Catalan–Spanish bilingual context, an environment in which many bilinguals whose first language is Spanish pronounce the Catalan language with the Spanish phoneme inventory. Thus, in the Spanish-accented dialect of Catalan, the /e/–/ɛ/ contrast is neutralized and words containing these two sounds are pronounced with a single mid-front vowel that Catalan natives usually assimilate to their /e/ vowel. What Sebastian-Galles et al. (2005) found was that Catalan natives showed an asymmetrical pattern of errors when they had to decide about the lexical status of the nonword stimuli. In this study, two types of stimuli were used. One type of stimuli (e-type stimuli) comprised words including the phoneme /e/ (such as /finestra/ meaning “window”) and their corresponding nonwords, made by exchanging vowel /e/ by vowel /ɛ/. Participants had no problems in making the appropriate responses, both for words and nonwords. The other type of stimuli (ɛ-type stimuli) was composed of words including the phoneme /ɛ/ (such as /gallɛda/, meaning “bucket”) and the corresponding nonwords, made by replacing vowel /ɛ/ by vowel /e/. This time, participants showed a strong bias to accept nonwords as real words. The explanation given by the authors was that the continued exposure of Catalan listeners to the Spanish dialect of Catalan (which includes ɛ-type nonwords like */galleda/, but not e-type ones like */finɛstra/) would eventually lead to the creation of two lexical entries for ɛ-type stimuli. Therefore, they concluded that the asymmetrical pattern was a sign of new representations having been added to the mental lexicon to represent dialectal variation.

Recently, we proposed a computational model to account for the findings of the Sebastian-Galles et al. (2005) study and to explore the implications of their results for the relationship between lexical and sublexical units (Larsson, Vera, Sebastian-Galles, & Deco, 2008). Specifically, we simulated two competing network structures, each containing four pools of integrate-and-fire neurons, and compared their performance. A pool in these models represented either one of two word contexts (a conglomeration of phonemes) or one of two single phonemes, the combination of which enabled two complete words and two complete nonwords to be represented. Each model corresponded to a particular cognitive hypothesis. Thus, Model 1 had as free parameters connections enabling an interaction between phoneme representations, whereas the free parameters of Model 2 were connections between the phoneme and word-context representations. In these models, upon presentation of a stimulus, the persistent activity in the stimulated pools was interpreted as word recognition. Using this protocol, a lexical decision task was simulated and it was found that Model 1 could not reproduce the data reported in Sebastian-Galles et al., showing almost chance performance in identifying */finɛstra/ as a nonword. In contrast, Model 2 showed the asymmetry seen in the experiments, with high performance for */finɛstra/ and significantly lower performance for */galleda/. Apart from this strong result of Model 2, the choice of its architecture as a plausible mechanism responsible for the experimental results was further strengthened in a simulated discrimination task. There it was shown that if a weight asymmetry was introduced between this model's phonemes, performance was considerably lower than with symmetric connections at the same level. Thus, Model 2, with connections creating an interaction between phoneme and word-context representations, was proposed as the most plausible architecture. This was interpreted as evidence for a new lexical entry having been created over time through exposure to mispronunciations of words such as /gallɛda/ as */galleda/. Thus, in the proposed model, and in striking contrast to some of the studies reviewed above, new lexical entries were created after a supposed long-term exposure to a dialect, but no modifications were observed at the phoneme level (the model assumed simultaneous exposure to dialectal and standard pronunciations). In Larsson et al. (2008), preliminary (partial) empirical evidence for the model was also provided. Thus, in the present study, we aimed to test the predictions of Model 2 in a systematic way by comparing the performance of the very same group of participants across different tasks, presumably tapping into prelexical (phoneme) and lexical information.

Firstly, participants were tested using the very same materials and procedure as used in Sebastian-Galles et al. (2005). The goal of Experiment 1 was to determine whether the selected participants showed the asymmetric pattern observed in the past. An additional objective of the present project was to provide additional support for the hypothesis of modified lexical entries for dialectal variations, for which conflicting data exist. Experiment 2 was aimed at providing such support. In particular, differences in the N400 component between words and nonwords were expected to be different for e-type and ɛ-type stimuli. In the following two experiments, participants' phoneme discrimination performance was assessed. In Experiment 3, participants performed a behavioral discrimination task, whereas in Experiment 4, electrophysiological measures were taken. In particular, participants' MMN, an automatic index of perceptual processing, was measured.

EXPERIMENT 1: SELECTION OF PARTICIPANTS

The first experiment was a replication of the auditory lexical decision task described in Sebastian-Galles et al. (2005), Experiment 1. The major aim of this experiment was to determine whether the sample of participants to be subsequently studied showed an asymmetric discrimination pattern in the discrimination of words and nonwords containing vowels /e/ and /ɛ/.

Methods

Participants

Thirty-two native Catalan listeners (24 women) participated in this experiment (average age = 22.7 years, SD = 9.4). Catalan was the only language spoken at home and it had been their dominant language for all their lives. They were born in Barcelona or its metropolitan area, and therefore, they grew up in a bilingual society. They had received a Spanish–Catalan bilingual education, and at the time of testing they were students at the University of Barcelona. Neither language nor auditory deficits were reported by any of them. They received either course credits or monetary compensation for their participation.

Materials

This experiment used the same materials as in Sebastian-Galles et al. (2005). Sixty-six Catalan words containing the vowel /e/ and 66 Catalan words containing the vowel /ɛ/ were selected (experimental words). Words varied in length (from one to four syllables) and word sets were matched for word frequency (average tokens per million for e-words = 722.27, SD = 1281.77; for ɛ-words, average = 662.19, SD = 1270.83, t test < 1; Rafel i Fontanals, 1998). The corresponding nonwords were created by replacing the vowel /e/ with /ɛ/ and vice-versa in each of the words. Thus, the word “galleda” (meaning “bucket”), pronounced [gλɛð],1 generated the nonword *[gλeð], whereas the word “finestra” (meaning “window”), pronounced [finestr], generated the nonword *[finɛstr]. The former was called an “ɛ-type” stimulus and the latter “e-type” stimulus. Because Catalan features vowel reduction, /e/ and /ɛ/ can only occur in stressed positions, so these changes were restricted to stressed syllables. Cognate status of the stimuli was determined and yielded a total of 52 noncognates (23 e-words and 29 ɛ-words) and 78 cognates (43 e-words and 29 ɛ-words). There were no differences in frequency between cognates and noncognates (p > .16). A χ2 test showed that there were no statistical differences in the distribution of cognates and noncognates across each stimulus type (χ2 = 1.482, ns) (for more details, see Sebastian-Galles et al., 2005). Forty words, matched in length and frequency, were used as fillers, and 40 nonwords were also included. These nonwords were made by changing one vowel in an existing Catalan word, although this change never involved either vowel /e/ or /ɛ/. Two different lists were created in such a way that half of the experimental words, together with the nonwords corresponding to the remaining experimental words, appeared in each list. Half of the participants were tested in each list.

Procedure

Participants were individually tested in sound attenuated booths. Stimuli were binaurally presented through Sennheiser HMS224 headphones. The experimental situation was controlled by the program EXPE (Pallier, Dupoux, & Jeannin, 1997). Participants were seated in front of a computer screen on which instructions were displayed. They were told that nonwords would be very similar to real words, and that they should pay attention to vowels because nonwords had been made by replacing a single vowel. They were specifically warned that, in many cases, vowels /ɛ/ and /e/ were exchanged. Feedback was provided during the training phase, in which some words and nonwords containing the /e/–/ɛ/ exchange were included. The task of participants was to decide as fast as possible if the auditory stimuli were Catalan words or not. They had to press one button with the index finger of their dominant hand to indicate a “word” response, and another button with their mid-finger of their dominant hand to indicate a “nonword” response.

Results and Discussion

Because of high error rates, three words and their corresponding nonwords were discarded (two from the e-set and one from the ɛ-set). Although both error rates and reaction times were measured, it was decided, given the high error rates for the ɛ-type nonwords, and as in Sebastian-Galles et al. (2005), to restrict the analyses to the error rates. Participants made very few errors for both types of words (e-type = 3.3%, SD = 3.8; ɛ-type = 5.1%, SD = 4.5). The percentage of errors for the e-type nonwords was low (14.4%, SD = 15.6). However, for the ɛ-type stimuli, the error rate was very high (43.4%, SD = 17.7). As in Sebastian-Galles et al., a nonparametric statistic (A′) was used as an accuracy measure and analyses were performed with this statistic.2 A post hoc t test comparison of means yielded reliable differences between e-type and ɛ-type stimuli [A′ = 0.942 and 0.845, respectively; t1(31) = 14.704, p < .001; t2(127) = 6.820, p < .001]. These results indicated that participants had greater difficulties in distinguishing ɛ-type words from their corresponding nonwords than they did for e-type stimuli. Furthermore, they closely replicate the finding of Sebastian-Galles et al., who reported A′ averages for e-type and ɛ-type stimuli of 0.953 and 0.874, respectively.

The goal of this experiment was to assess whether participants showed an asymmetrical pattern in their discrimination of e-type and ɛ-type words and nonwords. Indeed, Catalan natives performed significantly better at rejecting nonwords from the e-type category (i.e., finɛstra*) than from the ɛ-type one (galleda*).

The prevailing model in our computational study yielded two predictions. The first is that phoneme representations are not modified and the second that new lexical entries are created for dialectal variations. As mentioned, there are conflicting results as regards this second prediction. The following experiment therefore aimed to test this prediction further.

EXPERIMENT 2: EXPLORING LEXICAL REPRESENTATIONS

Our empirical evidence to support the existence of alternative lexical entries for dialectal varieties comes from the asymmetrical pattern of results observed in Sebastian-Galles et al. (2005). However, the very nature of the lexical decision task does not allow for unambiguous attribution of the origin of this asymmetry to the lexical level. It might be that participants detected the differences between words and nonwords in all cases; but under time pressure (as is usually the case in a lexical decision task), they accepted some ɛ-type nonwords as real words. It could also be that the differences between ɛ-type words and nonwords were too subtle to allow for conscious decisions, such as those requested in a lexical decision task. One way of avoiding this problem is by using electrophysiological measurements.

The N400 is an ERP component that is sensitive to semantic integration and processing of words presented in sentential contexts (McCallum, Farmer, & Pocock, 1984; Kutas & Hillyard, 1980). When isolated stimuli are presented in the context of a lexical decision task, pronounceable nonwords elicit larger N400 responses than do words, both in the visual (Holcomb, 1988, 1993; Bentin, 1987; Rugg & Nagy, 1987; Smith & Halgren, 1987; Bentin, McCarthy, & Wood, 1985) and in the auditory (Holcomb & Neville, 1990) domain. This difference between words and nonwords has been interpreted in different ways, although it is always assumed that the N400 is an index of lexical activation (Holcomb, Grainger, & O'Rourke, 2002) or semantic memory search (Kutas & Federmeier, 2000).

Sebastian-Galles, Rodriguez-Fornells, de Diego-Balaguer, and Diaz (2006) used the N400 component as a measure of lexical access. These authors also used words and nonwords where the /e/–/ɛ/ contrast was manipulated. One hypothesis of their study was that although no differences should be observed between ɛ-type words and nonwords (both stimuli would elicit a “word” response), enlarged N400 responses for e-type nonwords, when compared to e-type words, would be found. However, the authors failed to obtain electrophysiological differences in the N400 component, not only between ɛ-type words and nonwords but also between e-type words and nonwords. This was a surprising result because in their behavioral responses participants showed a clear asymmetry in the pattern of errors; in fact, the pattern of results was the same as that obtained in Sebastian-Galles et al. (2005). The similarity of the N400 component between e-type and ɛ-type stimuli clearly compromises one of the predictions of the proposal by Larsson et al. (2008). One potential explanation for this lack of differences in the study by Sebastian-Galles et al. (2006) could be that in order to increase the number of observations, participants were tested with all the stimuli in two different sessions. That is, they were tested one day with one of the lists and a few days later with the other list. If we presuppose that participants mistook many experimental nonwords for real words, then some stimuli (in particular, ɛ-type nonwords) were repeated across sessions. There is ample evidence that repetition effects lead to a diminished neural response, in particular of the N400 component (for a review, see Kutas & Federmeier, 2000). It is important to note that this attenuation has been considered to be the consequence of repeated access to semantic memory representations.

The goal of the present experiment was to replicate the study of Sebastian-Galles et al. (2006), but this time with participants being tested just once with the experimental materials. If the predictions of Larsson et al. (2008) are correct, significant differences in the N400 component should be observed only for the e-type stimulus, and not for ɛ-type ones.

Methods

Participants

Sixteen right-handed participants (12 women, 4 men) from the previous experiment took part in the electrophysiological recordings. The age range was 18–24 years. Four subjects were discarded because of recording problems (2), excessive eye blinking (1), or anxiety (1). Informed consent was procured for every participant, and they received monetary compensation for their participation in this experiment.

Materials

The materials and procedure used here were the same as those in the experiment by Sebastian-Galles et al. (2006), with the following exceptions: 113 Catalan words containing vowel /e/ and 114 Catalan words containing vowel /ɛ/ were selected (in the original experiment there were 120 stimuli of each type, but 7 e-type and 6 ɛ-type items were discarded because they generated many erroneous responses in that experiment (for further details, see Sebastian-Galles et al., 2006). Most words were nouns and a few verbs (15 in their citation form—infinitive) were also included. Words varied in length (from one to four syllables). The two types of words did not differ in terms of frequency (written word frequency per million for e-type words = 97.78, SD = 205.49; for ɛ-type words = 147.07, SD = 456.94, t test, p > .29) (Rafel i Fontanals, 1998). Lexical neighborhood densities for e-type and ɛ-type words were calculated from a phonetic Catalan corpus (Bonafonte, unpublished). A neighbor was defined as any word that would result from the addition, deletion, or substitution of a single phoneme. There were no statistical differences between the two types of stimulus (t < 1; average number of neighbors—and standard deviations—were as follows: e-type words = 1.89, SD = 2.78; ɛ-type words = 1.96, SD = 3.57).

Nonwords were generated in the same way as described previously, namely, by exchanging vowels /e/ and /ɛ/. Words and nonwords did not differ in terms of length (for e-type stimuli: words = 656 msec, nonwords = 637 msec, t test, p > .45; for ɛ-type stimuli: words = 649 msec, nonwords = 687 msec, t test, p > .90) or loudness (measured as the average RMS power: for e-type stimuli: words = −20.16 dB, nonwords = −21.81 dB, t test, p > .42; for ɛ-type stimuli: words, average = −20.42 dB, nonwords = −21.37 dB; t test, p > .43). Density neighborhood was also calculated for nonwords. There were no statistical differences between the two types of stimulus (t < 1; average number of neighbors—and standard deviations—were as follows: e-type words = 2.64, SD = 3.38; ɛ-type words = 2.98, SD = 3.79).

Recognition points (the earliest point at which a word can be identified) and deviation points (the earliest point at which a nonword differs from any existing word) were determined using a Catalan dictionary (Institut d'Estudis Catalans, 2007). Recognition and identification points were measured using Cool Edit software. There were no differences for either recognition points (e-type words = 393 msec, SD = 127; ɛ-type words = 363 msec, SD = 101; t = 1.93, p > .10), or deviation points (e-type nonwords = 340 msec, SD = 95; ɛ-type nonwords = 341, SD = 98; t test p > .90).

One hundred fifteen Catalan words were used as fillers, none of them containing either vowel /e/ or /ɛ/. Nonwords were created by exchanging the stressed vowel (again, vowels /e/ and /ɛ/ were not used). Stimuli were divided into two lists. Half of the words and half of the nonwords appeared in the first list and the other half in the second list. The two members of a word/nonword pair never appeared in the same list. The order of presentation was randomized for each participant.

Procedure

Subjects were comfortably seated in an electrically shielded room with dimmed illumination and adjustable temperature. Stimuli were presented binaurally through Sennheiser HD 435 Manhattan headphones at a comfortable sound level (79 dB). As in Experiment 1 (and as in Sebastian-Galles et al., 2006), participants were asked to perform an auditory lexical decision task and to respond by pressing a button. Half of the participants used their right hand for “yes” responses and their left one for “no” responses. The other half of participants responded in the reverse hand order. Instructions were displayed on the screen and further explanations were given if necessary. As in Experiment 1, participants were told that changes always involved vowels and that, in many cases, they involved vowels /e/ and /ɛ/. Stimulus presentation was divided into four blocks of approximately 5 min each (plus one nonrecorded training block). An asterisk appeared in the middle of the screen to indicate the beginning of a new trial. Participants were asked to avoid blinking, or to do so when they saw the asterisk (there was a period of 500 ± 200 msec in which to do this). Reaction times were measured from stimulus onset. The presentation of the stimuli was controlled by the experimental software EXPE (Pallier, Bosch, et al., 1997; Pallier, Dupoux, et al., 1997).

Results and Discussion

Behavioral Data

Participants showed the same pattern of errors as observed in Experiment 1. They made few erroneous responses for words (6.2%, SD = 4.9, for e-type words and 4.6%, SD = 3.1 for ɛ-type words). However, errors were high for ɛ-type nonwords (30.7%, SD = 25.1) and intermediate for e-type nonwords (14.2%, SD = 15.7). The A′ scores were calculated as described previously in Experiment 1. A t-test comparison showed that subjects responded better to e-type stimuli (mean A′ = 0.938) than to ɛ-type ones (mean A′ = 0.894) [t(11) = −2.938, p = .013].3

Electrophysiological Data

ERP data were recorded by means of a 32-channel EEG recording system (BrainAmp amplifier and Brain Vision Recorder software). Twenty-nine Ag/AgCl electrodes were mounted in an electrode cap according to the International 10–20 System (Electro Cap International) (CP3/4, CP1/2, CP5/6, Cz, F3/4, F7/8, FC1/2, FC5/6, Fp1/2, Fz, O1/2, P3/4, PO1/2, Pz, T3/4, A1/2). Recording was performed using an on-line digital band-pass filter of 0.1 to 50 Hz (Brain Vision Recorder 1.02) with linked tip nose reference. Vertical and horizontal movements were monitored with an electrode at the infraorbital ridge and lateral edge of left eye. Electrode impedances were kept below 5 kΩ. Sampling rate was 250 Hz. High (70 Hz) and notch filters (50 Hz) were used. Data were re-referenced before filtering to left and right mastoids. ERPs were calculated over 1500 msec poststimulus onset intervals relative to a baseline of 200 msec prior to stimulus onset. Figure 1 depicts the grand-average ERP (n = 12) elicited by words (thin line) and nonwords (thick line) for the two experimental conditions (stimulus locked).

Figure 1. 

Stimulus-locked ERPs synchronized to the onset of the stimulus. ERPs for both stimulus types illustrate the differences between word (thin line) and nonword (thick line) only in the e-type stimuli. Midline electrode locations are shown.

Figure 1. 

Stimulus-locked ERPs synchronized to the onset of the stimulus. ERPs for both stimulus types illustrate the differences between word (thin line) and nonword (thick line) only in the e-type stimuli. Midline electrode locations are shown.

For both conditions, the complex N100–P200 and a late negativity at central and frontal sites are observed. Mean amplitudes and peak latencies were calculated at different time windows for each participant, stimulus type, and lexicality at three electrode sites (Fz, Cz, and Pz). As in Sebastian-Galles et al. (2006), and because of the high error rates, particularly with ɛ-type nonwords, bioelectrical signals corresponding to incorrect behavioral responses were included in the analyses.

Mean amplitudes and latencies were calculated in 50–150 msec and 150–250 msec windows. They were then subjected to separate ANOVAs with stimulus type, lexicality, and electrode as factors (the Greenhouse–Geisser error was used in this and subsequent analyses). No significant levels were attained for any effect or interaction (only F values > 1 are here reported; in the 50–150 msec time window: lexicality: Famplit = 1.914; stimulus type: Famplit = 3.461, Flatency = 4.729; electrode: Famplit = 3.447, Flatency = 0.698, all p > .05; in the 150–250 msec time window: electrode: Famplit = 1.089, Flatency = 1.122, all p > .1).

Mean amplitudes were also calculated in the 300–600 msec window for each participant, stimulus type, and lexicality at three electrode sites (Fz, Cz, and Pz). The ANOVA showed that the electrode factor reached significance levels [F(2, 22) = 13.671, p = .0001]. The lexicality factor almost reached significance [F(1, 11) = 4.691, p = .051], but it was modulated by the significant interaction of stimulus type and lexicality [F(1, 11) = 5.654, p = .036]. Separate t-test comparisons at each electrode site showed that the amplitude for e-type nonwords (e.g., */finɛstra/) was significantly larger than for e-type words (e.g., /finestra/) [at Fz: t(11) = 2.777, p = .043; at Cz: t(11) = 3.032, p = .011; and at Pz: t(11) = 2.778, p = .018]. However, no differences were found between words and nonwords for the ɛ-type stimuli (for all three locations, t < 1). Latencies were also measured in the same temporal window (300–600 msec) for the same three channels. The analysis of peak latencies showed no significant effect or interaction (ɛ-type: words = 540 msec, nonwords = 543 msec; e-type: words = 541 msec, nonwords = 544 msec; all F < 2.267, p > .1). Figure 2 shows the topographic distributions of the negativity starting around 300 msec (at the top), being maximal after 400 msec.

Figure 2. 

Topographic distribution of the effects for two stimuli types (e, ɛ) by lexical status (word, nonword). Figure shows amplitude average values for the 300–700 msec interval, separated in three time windows (300–434 msec, 434–568 msec, 568–700 msec).

Figure 2. 

Topographic distribution of the effects for two stimuli types (e, ɛ) by lexical status (word, nonword). Figure shows amplitude average values for the 300–700 msec interval, separated in three time windows (300–434 msec, 434–568 msec, 568–700 msec).

In contrast to the results of Sebastian-Galles et al. (2006), the present study found converging results for the behavioral and electrophysiological data. Indeed, significant differences in the amplitude of the N400 component were observed between words and nonwords, but only in the e-type condition. The lack of differences between ɛ-type words and nonwords in this component can be taken as an indication that Catalan native listeners process properly pronounced words and their dialectal (mis)pronunciations in an equivalent way. This conclusion is supported by the significant difference in the N400 component observed in the e-type condition between words and nonwords, suggesting that the lack of differences for ɛ-type words and nonwords is not due to a lack of experimental or statistical power. The next experiments aimed to explore participants' capacity to distinguish the phoneme contrast /e/–/ɛ/.

EXPERIMENT 3

The model of Larsson et al. (2008) predicts that extended exposure to dialectal variations in which a particular phoneme contrast is lost should not induce any change in the discrimination of that phoneme contrast (as long as exposure to this contrast continues to be present in the environment). Thus, it follows that in the linguistic situation of Catalan natives exposed to the Spanish dialect of Catalan, the continuous contact with mispronunciations of words such as */galleda/ or */estel/ (instead of /gallɛda/ or /estɛl/) should create new lexical entries (coexisting with the real ones). However, the dynamics of this process should not significantly alter the sublexical (phoneme) representations. If this were the case, then Catalan natives, who have been shown to have difficulties in deciding that */galleda/ or */estel/ are not Catalan words, should not show difficulties in the discrimination of phonemes /e/ and /ɛ/. However, if lexical feedback modifies phoneme categories, they may show a reduced performance in the discrimination of the two vowels.

In the present experiment, the same 32 participants tested in Experiment 1 were asked to perform a continuous discrimination task in which different tokens of syllables /de/ and /dɛ/ from different speakers were used. The use of multiple tokens from different speakers at a relatively fast rate was adopted after extensive piloting with different materials and rates of presentation. In a series of pilot studies, the number of speakers was increased from one to three and also the number of tokens was increased, making the task more difficult. Furthermore, the presentation rate was the fastest participants could be tested at before their performance dropped in a catastrophic way.

Methods

Materials

Three different female Catalan native speakers produced several tokens of syllables /de/ and /dɛ/ in a single session (between 8 and 10 tokens each speaker). Individual files were created for each token using Cool Edit (sampling rate 16,000 Hz and 16 bits of resolution). A final sample of 18 different tokens was selected for this experiment, half for each category. To make the selection, a two-stage process was carried out. First, each speaker discarded from her own utterances those that she considered very poor exemplars. After that, the three speakers, plus an additional native Catalan speaker, selected at least three tokens as the best exemplars from each category. To enter the final selection, each token had to be selected by at least three judges.

The average lengths of the selected stimuli were 390 msec (SD = 49.3) for syllables /de/ and 412 msec (SD = 38.5) for syllables /dɛ/. Intensity was measured with Praat (Boersma & Weenink, 2005) (syllable /de/ average = 69.8 dB; syllable /dɛ/ average = 69.7 dB). There were no significant differences between the two types of stimulus in any measure.

Procedure

The experimental setting and equipment were the same as in Experiment 1. Participants were asked to press a button as soon as they detected a change of category in a continuous stream of syllables /de/ and /dɛ/. Stimuli were presented in a pseudorandom order. There were no fewer than three tokens of the same category in a row and no more than eight tokens before a change took place. Stimuli were separated by about 400 msec. The SOA was held constant to 800 msec. One trial was defined as the 800-msec interval between stimulus onsets. An initial training phase was conducted in which participants were asked to do the same task, with only two tokens (one from each category) from a single speaker. In this training phase, which consisted in the presentation of 25 tokens, a visual cue (the number 1 or 2 displayed in the center of a computer screen) was shown contingent to each phoneme category. Furthermore, when a change took place, another cue (an asterisk) appeared indicating that they should press the key. These cues were not presented in the experimental phase, instead an asterisk appeared during syllable presentations as a fixation point.

In the experimental phase, all tokens were presented. The stream of syllables consisted of 200 tokens of each category. The 400 tokens were presented in two blocks of 200 tokens each. Subjects could have a rest between the two blocks. Participants were specifically encouraged to respond to categorical differences, not just acoustical ones. They were also encouraged to give their response as quickly as possible, without increasing the error rates. The total duration of the experiment was around 15 min. The experiment was controlled by the Presentation 0.60 software.

Results and Discussion

Because of the asymmetrical nature of the error rates in the lexical decision task, participants' data were analyzed by separating the two possible directional changes: /dɛ/ → /de/ and /de/ → /dɛ/. To obtain the A′ statistic,4 correct response (CR) and false alarm (FA) percentages were calculated. A trial in which the stimulus belonged to a different category than the previous one was considered a “change trial.” A CR was counted when a response was given during a change trial. Participants could give their response during the whole trial duration (800 msec). When a response was given at any other moment, it was counted as a FA. For each participant there were around 70 change trials and 330 no-change trials. Percentages of CRs were high in both conditions [/dɛ/ → /de/: 86.4% (12.8); /de/ → /dɛ/: 81.8% (15.7), standard deviation in parentheses], whereas FAs were low [/dɛ/ → /de/: 5.3% (4.2); /de/ → /dɛ/: 4.4% (3.0)]. The statistical analysis showed a high performance, regardless of the direction of change (A/dɛ/ →/de/ = 0.945; A/de/ →/dɛ/ = 0.935).

The results of this experiment did not reveal any difficulty in the perception of the /e/–/ɛ/ phoneme contrast. The high performance supports the predictions of the model of Larsson et al. (2008). However, the possibility that the conclusions are based on a null result should be considered, and it may be that despite our efforts to make the task as hard as possible, participants had some unnoticed difficulties in the perception of this contrast. The following experiment aimed to further test this issue by measuring a more fine-grained measure, namely, the MMN, an automatic, electrophysiological index.

EXPERIMENT 4: MMN DISCRIMINATION

The MMN is an automatic change-detection response commonly used in phonetic categorization studies (Peltola et al., 2003; Näätänen et al., 1997). The MMN is elicited by oddball auditory stimuli that deviate from the preceding repetitive auditory stimulation (Näätänen, Tervaniemi, Sussman, Paavilainen, & Winkler, 2001) and it can be estimated by subtracting the response to a frequent (standard) sound from that to an infrequent sound (deviant). Of particular relevance for the present study is MMN evidence suggesting that categorical sound perception is guided by recognition traces (Näätänen et al., 2001), such that speech perception is based on language-particular memory traces within the auditory cortex (Näätänen et al., 1997). The MMN has been shown to be sensitive to subtle phonetic differences. For instance, the amplitude wave for native categories has been reported to be larger than for nonnative stimuli (Näätänen et al., 1997). Therefore, it seems particularly appropriate to our current goals.

If participants modified their perceptual categories as a consequence of exposure to the dialectal variations, we should expect a diminished electrophysiological response, measured via the MMN amplitude, when comparing the /e/–/ɛ/ contrast to another phoneme contrast common to both languages and presumably not modified by exposure to the Spanish dialect of Catalan. However, if the model of Larsson et al. (2008) is correct and phonetic categories have not been significantly altered, no differences between the /e/–/ɛ/ contrast and another control phoneme pair should be found. In the present experiment, the electrophysiological responses to three different phoneme pairs were taken: /de/–/dɛ/, /dɛ/–/de/, and /di/–/de/. The location of these three vowels in the Catalan vowel space makes them equivalent in terms of perceptual discriminability (Bosch et al., 2000; Martínez Celdrán, 1994; Cerdà, 1972).

Methods

Participants

The same 16 participants from Experiment 2 were again tested. Data from four subjects were discarded because of excessive blinks or body movements (the same as in Experiment 2). Participants received monetary compensation for their participation in this experiment.

Materials

One token from each category (/de/ and /dɛ/) of the behavioral discrimination task was selected. To increase the chances of observing significant differences, the two tokens corresponded to those from the same speaker with the highest correct responses. For the /de/–/di/ contrast (first syllable refers to the deviant stimulus and the second syllable refers to the standard one), new tokens were produced by the same female speaker from which the /de/–/dɛ/ stimuli were selected. The new stimuli were selected with the same procedure as described in the previous experiment. Individual files were created for each stimulus type and all four files were edited with Cool Edit 2000 with the duration being fixed to 386 msec (the amplitudes of the first and final 30 msec were enveloped). No initial silent period was left at the beginning or end of the file, as can be seen in Figure 3. Table 1 provides different acoustic measurements for each syllable used in the present experiment: central vowel F1 and F2, average pitch, average intensity, and total and prevoicing duration. Finally, syllables /de/ and /dɛ/ activated similar word cohorts; the percentage of words starting with the sequence /de/ is 0.039, whereas for the syllable /dɛ/ is 0.086 (that is 21 words and 47 words, respectively, out of the 54,197 entries listed in the Catalan adaptation of the FESTIVAL project; Bonafonte, 2007; Black & Taylor, 1997). The percentage of words starting with /di/ was of 2.084 (that is 1130 in the corpus). The increase in number of words starting with syllable /di/ is mainly due to two different causes. First, that contrary to /e/ and /ɛ/, vowel /i/ can appear in stressed and unstressed syllables. Second, the syllables /di/ and /dis/ are prefixes in Catalan.

Figure 3. 

Spectrograms of the four syllables used in Experiment 3. The upper two spectrograms correspond to stimuli used in the /dɛ/–/de/ and /de/–/dɛ/ conditions. The bottom two spectrograms correspond to the stimuli used in the /di/–/de/ condition.

Figure 3. 

Spectrograms of the four syllables used in Experiment 3. The upper two spectrograms correspond to stimuli used in the /dɛ/–/de/ and /de/–/dɛ/ conditions. The bottom two spectrograms correspond to the stimuli used in the /di/–/de/ condition.

Table 1. 

Different Acoustic Measurements of Syllables Used in Experiment 4



de
de
di
F1 638 444 494 304 
F2 2176 2433 2360 2761 
Intensity (dB) 80 82 80 80 
Duration (msec) 
 Whole syllable 386 386 386 386 
 Prevoicing 101 101 103 99 


de
de
di
F1 638 444 494 304 
F2 2176 2433 2360 2761 
Intensity (dB) 80 82 80 80 
Duration (msec) 
 Whole syllable 386 386 386 386 
 Prevoicing 101 101 103 99 

Information about syllables used in the /de/–/dɛ/ and /dɛ/–/de/ comparisons is shown in the first two columns. Information about syllables used in the /de/–/di/ comparison is displayed in the last two columns. Formant frequencies (F1, F2, in Hz) were estimated over a central interval of 200 msec of the vowel period. Information about duration is provided for both the whole syllable and the prevoicing period. Measurements were obtained using Praat (www.praat.org).

Two pure tones (1000 and 1100 Hz, 79 dB) were also created to be used in an auditory control condition. These stimuli had the same duration (386 msec) and were edited using the very same procedures as the syllabic ones.

Procedure

Participants were tested in the same experimental setting and session as in Experiment 2. All participants were tested first in the auditory lexical decision task and, subsequently, in a passive oddball discrimination procedure. Each block consisted of 500 trials (85% standard, 15% deviant). SOA was fixed at 600 msec. Each condition was presented twice, and therefore, each participant was tested with eight blocks. Block presentation was randomized and blocks were repeated twice in a mirror way (i.e., for every subject, the first and last blocks were the same). The duration of a single block was around 6 min. After each block, the experiment was stopped and participants could rest; during the experiment they watched a silenced movie and were not asked to perform any task with the auditory stimuli.

Electrophysiological Recording and Data Analysis

Electrophysiological data were obtained and processed as described in Experiment 2, with the exception that they were not re-referenced before filtering to left and right mastoids. ERPs were calculated over 600 msec poststimulus onset intervals relative to a baseline of 100 msec prior to stimulus onset. Mean MMN amplitudes were calculated at Fz, as the difference wave of deviant–standard stimuli for a 40-msec interval (20 msec plus and minus from the average group peak) in each condition. Only data from subjects with more than 70 valid segments for each type of stimulus were used to calculate the averages.

Results and Discussion

Visual inspection of the tone condition showed no abnormal MMN waves for any participant. MMN waves reached significance values for all conditions, both when comparing standard versus deviant waves, and when comparing the difference wave against zero. MMN showed a fronto-central scalp distribution with inverted (positive) amplitudes in mastoids (see Table 2 and Figure 4). The mean amplitude voltage was subjected to a repeated measures ANOVA with condition (/dɛ/–/de/, /de/–/dɛ/, and /de/–/di/) and electrode (Fz, Cz) as within-subject factors. No significant effects or interactions were observed (all Fs < 1).

Table 2. 

MMN Values in Each Condition (/dɛ/–/de/, /de/–/dɛ/, and /de/–/di/) for Each Electrode (Fz, Cz)


MMN Average Amplitudes (μV)
Standard vs. Deviant
Comparison against Zero
/dɛ/–/de/ Fz = −1.442 (1.816) t = 3.117, p = .009 t = −2.752, p = .019 
Cz = −1.604 (1.344) t = 4.276, p = .001 t = −4.133, p = .002 
/de/–/dɛ/ Fz = −1.719 (2.166) t = 3.739, p = .003 t = −2.750, p = .019 
Cz = −1.549 (1.651) t = 4.276, p = .001 t = −3.252, p = .008 
/de/–/di/ Fz = −2.031 (1.671) t = 4.036, p = .002 t = −4.208, p = .001 
Cz = −1.925 (1.553) t = 3.357, p = .006 t = −4.292, p = .001 

MMN Average Amplitudes (μV)
Standard vs. Deviant
Comparison against Zero
/dɛ/–/de/ Fz = −1.442 (1.816) t = 3.117, p = .009 t = −2.752, p = .019 
Cz = −1.604 (1.344) t = 4.276, p = .001 t = −4.133, p = .002 
/de/–/dɛ/ Fz = −1.719 (2.166) t = 3.739, p = .003 t = −2.750, p = .019 
Cz = −1.549 (1.651) t = 4.276, p = .001 t = −3.252, p = .008 
/de/–/di/ Fz = −2.031 (1.671) t = 4.036, p = .002 t = −4.208, p = .001 
Cz = −1.925 (1.553) t = 3.357, p = .006 t = −4.292, p = .001 

Column 1: MMN average amplitudes (μV; standard deviation in parentheses). Column 2: t-test comparisons of standard and deviant stimuli (df = 11, all p < .01). Column 3: t-test comparisons of each MMN wave against zero (df = 11, all p < .02).

Figure 4. 

The grand average for standard (thin line), deviant (medium line), and difference (MMN) waveforms (thick line) in the three stimulus pairs, from Cz, Fz, and left (LM) and right mastoids (RM). The two upper panels show similar amplitudes for each stimuli pair. The two bottom panels show the polarity inversion at the mastoids.

Figure 4. 

The grand average for standard (thin line), deviant (medium line), and difference (MMN) waveforms (thick line) in the three stimulus pairs, from Cz, Fz, and left (LM) and right mastoids (RM). The two upper panels show similar amplitudes for each stimuli pair. The two bottom panels show the polarity inversion at the mastoids.

To analyze potential differences in latencies, the maximum peak in the temporal window of 200–350 msec was selected in the Fz electrode, for each subject in each condition. The averages for each condition were as follows: /dɛ/–/de/: 282.6 msec (SD = 27.3); /de/–/dɛ/: 280.3 msec (SD = 37.81); /di/–/de/: 293.3 msec (SD = 28.2). Again, no difference reached significance (all t values, p > .15; see Figure 5; analogous results were obtained for the Cz electrode).

Figure 5. 

MMN latencies of the three conditions [in Fz electrode: /dɛ/–/de/: 282.6 msec (SD = 27.3), solid line; /de/–/dɛ/: 280.3 msec (SD = 37.81), thick solid line; /di/–/de/: 293.3 msec (SD = 28.2), dashed line; all t values p > .15].

Figure 5. 

MMN latencies of the three conditions [in Fz electrode: /dɛ/–/de/: 282.6 msec (SD = 27.3), solid line; /de/–/dɛ/: 280.3 msec (SD = 37.81), thick solid line; /di/–/de/: 293.3 msec (SD = 28.2), dashed line; all t values p > .15].

These results replicate and extend those obtained in the previous experiment: Participants have no difficulties in discriminating phonemes /e/ and /ɛ/. Moreover, the electrophysiological response of the contrasts involving these two phonemes is not significantly different from that of another phoneme contrast, for which no influence of exposure to a dialect is suspected. These results confirm the prediction of the model of Larsson et al. (2008), in that no long-term modifications of the phoneme representations are postulated.

GENERAL DISCUSSION

The present series of experiments has explored the dynamics of prelexical and lexical representations in accommodating dialectal variation. Taken together, the results support two main conclusions. First, lexical entries are modified to capture the standard and dialectal pronunciations. Second, there is no need for modifications of the phoneme categories involved in the vowel contrast. These results also support the predictions of Larsson et al. (2008). The simulations performed by these authors showed that the model had to incorporate two properties in order to capture properly the asymmetric pattern of Catalan natives in the lexical decision task: modifications at the lexical level and no lack of discriminability at the phoneme level. The model incorporated both properties simply by altering the connections between the pools, which together constituted a new word in the mental lexicon. This modification was able to account for the asymmetry in the lexical decision task.

Larsson et al. (2008) provided preliminary experimental support, which has been extended by the present series of experiments. Here a group of Catalan natives, with extensive exposure to a Spanish dialect of Catalan, were tested on four different situations. First, they were tested on an auditory lexical decision task, with the same materials and procedure as used by Sebastian-Galles et al. (2005). The goal of this experiment was to assess the existence of an asymmetrical pattern in this group of participants. The existence of new lexical entries for dialectal variations was further explored in Experiment 2. In this experiment, differences in the N400 component for e-type words and nonwords were obtained: As in previous studies (Sebastian-Galles et al., 2006; Holcomb & Neville, 1990), (pronounceable) nonwords elicited larger amplitudes than did real words. This difference between words and nonwords was not obtained for ɛ-type stimuli. Indeed, the lack of differences in the N400 component supports the assumption that both stimuli are represented in the mental lexicon. The assumption that phoneme discrimination was not impaired was tested in the next two experiments. In Experiment 3, participants performed a behavioral discrimination task of isolated consonant–vowel syllables. The results indicated that despite various efforts to increase the difficulty of the task, participants' performance was very good, and thus, no evidence for perceptual difficulties could be obtained. Experiment 4 aimed to test the potential loss of discriminability between phonemes /e/ and /ɛ/ using a more sensitive measurement. In this experiment, participants' discrimination capacities were measured through the MMN. The results showed no differences in amplitude and latency between the discrimination of /e/ and /ɛ/ vowels and the discrimination of /i/ and /e/, a vowel contrast that is equivalent in terms of perceptual distance to the /e/–/ɛ/ one. The results of these two experiments confirmed, as far as these tasks are able to, that no long-term modifications of these categories had occurred.

Further support for the robustness of native phoneme perception is provided by the lack of frequency effects in phoneme discrimination. It should be noted that although Catalan natives may be exposed to relatively few exemplars of the /e/–/ɛ/ contrast, we have not been able to find any trace of perceptual or discrimination problems. As already mentioned, the Catalan language features vowel reduction, meaning that these two vowels (/e/ and /ɛ/) can only appear in stressed positions. It has been estimated that the frequency (percentage) of each of these two vowels is 5% and 7% (of all Catalan vowels; Rafel, 1980), and thus, they are relatively rare. In contrast, vowel /e/ in Spanish is very frequent (27% of all Spanish vowels; Alcina & Blecua, 1975). In a bilingual environment, the presence of the Catalan contrast should be even smaller. Assuming that our adult Catalan natives had a significant exposure to Spanish-accented Catalan, and that Spanish natives produce vowel /ɛ/ in a way that Catalan natives assimilate to vowel /e/, the actual amount of exposure to the vowel category /ɛ/ must be extremely low. There is ample evidence that humans, both infants and adults (Maye, Werker, & Gerken, 2002; Maye & Gerken, 2001), are highly sensitive to the statistical and distributional properties of phonemes in the environment when it comes to establishing phoneme contrasts. One main conclusion of the present series of experiments is therefore that frequency plays a relatively negligible role in (adult) native phoneme contrast perception. As shown, the perception of the /e/–/ɛ/ contrast is very robust, thus in real-life situations, such as the one our participants live in, native contrasts are not easily cancelled out.

It has to be noticed that although our proposal is described in terms of a biophysically realistic neurodynamic model, describing neural activity at the synaptic and spiking levels, it is compatible with other proposals using different theoretical approaches. In particular, our proposal and results are compatible with the FUL (Featurally Underspecified Lexicon) model (Lahiri & Reetz, 2002; Lahiri & Marslen-Wilson, 1991). Two assumptions of this model are relevant here. First, it assumes that all features are extracted from the speech signal at a perceptual level, thus predicting the excellent (and symmetric) discrimination of syllables /de/ and /dɛ/. Second, it assumes that feature values, which frequently undergo variation (e.g., due to assimilation), are not listed in the mental lexicon. Applied to the present situation, it might be argued that listeners only rarely hear variation for the e-type words. Therefore, it would be safe to represent the /e/ in detail. In contrast, ɛ-type words would appear either correctly pronounced (Catalan dominant bilinguals) or mispronounced (Spanish dominant bilinguals). For reliable word recognition, a successful strategy might thus be to leave the value for some feature (in the present case, the height feature that distinguishes both vowels) empty for ɛ-type words. In this way, ɛ-type nonwords would find an empty slot for the height value in the lexical representation of ɛ-type words (no-mismatch) and ɛ-type nonwords would be able to activate ɛ-type words in the lexicon. This would explain both the high error rates in the lexical decision task for ɛ-type nonwords and the lack of differences in the N400 component between these words and nonwords. Interestingly, an asymmetry that follows the same mismatch/no-mismatch logic has been found in the N400 nonword effect in a lexical decision task (Friedrich, Eulitz, & Lahiri, 2006).

How can the present results be reconciled with those observed in the perceptual learning literature? As discussed in the Introduction, the preferred explanation for the high plasticity observed in perceptual learning studies has been to assume that changes occur at the prelexical level, while the lexical representations remain relatively unchanged. As mentioned, a fundamental methodological difference between these studies and the current experiments rests on the amount of exposure (both in terms of quantity and duration) which participants had to the “abnormal” input. Our participants were exposed to the dialectal variety for very long periods (over years) and to very varied tokens. In a prototypical laboratory situation, participants are intensively exposed for a few minutes to a reduced number of tokens.

Interestingly, different cortical structures seem to be involved in fast and slow learning. McClelland, McNaughton, and O'Reilly (1995) proposed a “complementary systems” approach to account for these two types of learning situations. These authors suggested a hybrid memory system in which slow learning of general patterns would be supported by cortical networks, whereas the hippocampal system would be in charge of fast learning. Recently, Goldinger (2007) has proposed a similar architecture to account for different results in the domain of speech perception. According to this author, massive exposure would engage cortical mechanisms responsible for slow changes, leading to stable representations. In contrast, the changes observed in short laboratory exposures would reflect the functioning of more transient hippocampal systems. Thus, one theoretical possibility would be that the lexical changes observed in our study would be the result of long-term engagement of cortical structures, which are not involved in the short-term phoneme adjustments observed in laboratory situations. Although this framework seems initially attractive, it has limited support at present. In fact, current explanations of the perceptual learning experiments do not frame the neuroanatomical basis of the effect in terms of cortical–subcortical networks, but rather according to cortical–cortical ones. For instance, Davis and Johnsrude (2007) explain the phoneme boundary shifts observed in perceptual learning experiments as reflecting long-lasting (cortical) changes in phoneme categorization. Furthermore, McClelland himself (Mirman, McClelland, & Holt, 2006) has modeled the perceptual learning results in terms of a single interactive Hebbian learning network.

Our model was specifically designed to account for the asymmetric results reported by Sebastian-Galles et al. (2005). In this respect, there is no need for complementary systems to account for our current data. However, we acknowledge that it does not conflict with a complementary systems-type proposal because it models the long-term and—presumably cortical—mechanisms. Neural realistic modeling and more empirical studies are clearly needed to reach a proper understanding of how long-term and transient information is processed in the brain.

Acknowledgments

We thank Begoña Díaz for her helpful suggestions and X. Mayoral for technical support. We also thank Antonio Bonafonte for granting access to the Catalan version of the FESTIVAL project. This study was supported by the Spanish Ministerio de Educación y Ciencia (with EC Fondos FEDER Consolider Ingenio 2010 CE-CSD2007-00121 to N. S. G. and G. D. and SEJ2007-60751 to N. S. G.). J. L. was supported by the European research project EmCAP (FP6-IST, contract 013123) and by the Generalitat de Catalunya through an IGSOC/IQUC grant, whereas F. V. C. was supported by a fellowship from the Spanish Ministerio de Educación y Ciencia (BES-2005-9031).

Reprint requests should be sent to Núria Sebastián-Gallés, GRNC, Departament de Tecnologies de la Informació i les Comunicacions. Universitat Pompeu Fabra. Carrer Tanger 122-140, 08018 Barcelona, Spain, or via e-mail: nuria.sebastian@eupf.edu.

Notes

1. 

To facilitate the reading, in the rest of the article the only phonetic symbols that will be used correspond to the /e/–/ɛ/ contrast.

2. 

A′ = 0.5 + {[(HIT − FA) * (1 + HIT − FA)]/[4 × HIT × (1 − FA)]}, for HIT > FA. HIT was considered when subject pressed button “yes” when a word was presented, and FA when a nonword was also accepted as a word.

3. 

Reaction times to correct responses (range 200–3000 msec) were also calculated and submitted to an ANOVA. A main effect of stimulus type was observed [F(1, 15) = 12.013, p < .003], which was modulated by the interaction Lexicality by Stimulus type [F(1, 15) = 9.347, p < .008]. Although there were no differences in the latencies for both types of words (e-type: 1185 msec, ɛ-type: 1190 msec), ɛ-type nonwords were responded faster than e-type ones (1293 and 1211 msec, respectively). The same pattern of significance was obtained if latencies to incorrect responses were included in the analysis.

4. 

d′ statistics were also computed. Significance patterns were the same as for the A′ statistic [/dɛ/ → /de/ d′ = 2.94, SD = 0.94; /de/ → /dɛ/ d′ = 2.83, SD = 1.01; t(31) = 1.134, p = .265]. To facilitate the comparison across experiments, it was decided that only the A′ results are reported.

REFERENCES

Alcina
,
J.
, &
Blecua
,
J. M.
(
1975
).
Gramática española [Spanish grammar].
Barcelona
:
Editorial Ariel
.
Bentin
,
S.
(
1987
).
Event-related potentials, semantic processes, and expectancy factors in word recognition.
Brain and Language
,
31
,
308
327
.
Bentin
,
S.
,
McCarthy
,
G.
, &
Wood
,
C. C.
(
1985
).
Event-related potentials, lexical decision and semantic priming.
Electroencephalography and Clinical Neurophysiology
,
60
,
343
355
.
Black
,
A.
, &
Taylor
,
P.
(
1997
).
Festival speech synthesis system: System documentation.
Technical report, Human Communication Research Centre.
Boersma
,
P.
, &
Weenink
,
D.
(
2005
).
Praat: Doing phonetics by computer.
(Version 4.3.01) [Computer program]. Retrieved from http://www.praat.org/.
Bonafonte
,
A.
(
2007
).
Phonetic Catalan dictionary (adaptation to Catalan of the FESTIVAL project).
Universitat Politècnica de Catalunya. Unpublished manuscript.
Bosch
,
L.
,
Costa
,
A.
, &
Sebastian-Galles
,
N.
(
2000
).
First and second language vowel perception in early bilinguals.
European Journal of Cognitive Psychology
,
12
,
189
222
.
Cerdà
,
R.
(
1972
).
El timbre vocálico en catalán [Vowel pitch in Catalan].
Madrid
:
CSIC
.
Clarke
,
C. M.
, &
Garrett
,
M. F.
(
2004
).
Rapid adaptation to foreign-accented English.
Journal of the Acoustical Society of America
,
116
,
3647
3658
.
Connine
,
C. M.
(
2004
).
It's not what you hear but how often you hear it: On the neglected role of phonological variant frequency in auditory word recognition.
Psychonomic Bulletin & Review
,
11
,
1084
1089
.
Davis
,
M. H.
, &
Johnsrude
,
I. S.
(
2007
).
Hearing speech sounds: Top–down influences on the interface between audition and speech perception.
Hearing Research
,
229
,
132
147
.
Davis
,
M. H.
,
Johnsrude
,
I. S.
,
Hervais-Adelman
,
A.
,
Taylor
,
K.
, &
McGettigan
,
C.
(
2005
).
Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences.
Journal of Experimental Psychology: General
,
134
,
222
241
.
Eisner
,
F.
, &
McQueen
,
J. M.
(
2005
).
The specificity of perceptual learning in speech processing.
Perception & Psychophysics
,
67
,
224
238
.
Eisner
,
F.
, &
McQueen
,
J. M.
(
2006
).
Perceptual learning in speech: Stability over time.
Journal of the Acoustical Society of America
,
119
,
1950
1953
.
Floccia
,
C.
,
Goslin
,
J.
,
Girard
,
F.
, &
Konopczynski
,
G.
(
2006
).
Does a regional accent perturb speech processing?
Journal of Experimental Psychology: Human Perception and Performance
,
32
,
1276
1293
.
Friedrich
,
C.
,
Eulitz
,
C.
, &
Lahiri
,
A.
(
2006
).
Not every pseudoword disrupts word recognition: An ERP study.
Behavioral and Brain Functions
,
2
,
1
36
.
Goldinger
,
S. D.
(
2007
).
A complementary-systems approach to abstract and episodic speech perception
(pp.
49
54
). XVI International Congress of Phonetic Sciences, Saarbrücken, Germany.
Holcomb
,
P. J.
(
1988
).
Automatic and attentional processing: An event-related brain potential analysis of semantic processing.
Brain and Language
,
35
,
66
85
.
Holcomb
,
P. J.
(
1993
).
Semantic priming and stimulus degradation: Implications for the role of the N400 in language processing.
Psychophysiology
,
30
,
47
61
.
Holcomb
,
P. J.
,
Grainger
,
J.
, &
O'Rourke
,
T.
(
2002
).
An electrophysiological study of the effects of orthographic neighborhood size on printed word perception.
Journal of Cognitive Neuroscience
,
14
,
938
950
.
Holcomb
,
P. J.
, &
Neville
,
H. J.
(
1990
).
Auditory and visual semantic priming in lexical decision: A comparison using event-related brain potentials.
Language and Cognitive Processes
,
5
,
281
312
.
Institut d'Estudis Catalans
.
(
2007
).
Diccionari de la llengua catalana [Dictionary of the Catalan language].
Barcelona
:
Enciclopèdia Catalana
. Retrieved from http://dlc.iec.cat/index.html. January, 2008.
Kraljic
,
T.
, &
Samuel
,
A. G.
(
2005
).
Perceptual learning for speech: Is there a return to normal?
Cognitive Psychology
,
51
,
141
178
.
Kraljic
,
T.
, &
Samuel
,
A. G.
(
2006
).
Generalization in perceptual learning for speech.
Psychonomic Bulletin & Review
,
13
,
262
268
.
Kutas
,
M.
, &
Federmeier
,
K. D.
(
2000
).
Electrophysiology reveals semantic memory use in language comprehension.
Trends in Cognitive Sciences
,
4
,
463
470
.
Kutas
,
M.
, &
Hillyard
,
S. A.
(
1980
).
Reading senseless sentences: Brain potentials reflect semantic incongruity.
Science
,
207
,
203
205
.
Lahiri
,
A.
, &
Marslen-Wilson
,
W.
(
1991
).
The mental representation of lexical form: A phonological approach to the recognition lexicon.
Cognition
,
38
,
254
294
.
Lahiri
,
A.
, &
Reetz
,
H.
(
2002
).
Underspecified recognition.
In C. Gussenhoven & N. Werner (Eds.),
Laboratory phonology VII
(pp.
637
675
).
Berlin
:
Mouton
.
Larsson
,
J. P.
,
Vera
,
F.
,
Sebastian-Galles
,
N.
, &
Deco
,
G.
(
2008
).
Lexical plasticity in early bilinguals does not alter phoneme categories: I. Neurodynamical modelling.
Journal of Cognitive Neuroscience
,
20
,
76
94
.
Martínez Celdrán
,
E.
(
1994
).
Fonética [Phonetics].
Barcelona
:
Teide
.
Maye
,
J.
,
Aslin
,
R. N.
, &
Tanenhaus
,
M. K.
(
2008
).
The weckud wetch of the wast: Lexical adaptation to a novel accent.
Cognitive Science
,
32
,
543
562
.
Maye
,
J.
, &
Gerken
,
L. A.
(
2001
).
Learning phonemes: How far can the input take us?
Proceedings of the 25th Annual Boston University Conference in Language Development
, Boston, MA,
1
,
480
490
.
Maye
,
J.
,
Werker
,
J. F.
, &
Gerken
,
L.
(
2002
).
Infant sensitivity to distributional information can affect phonetic discrimination.
Cognition
,
82
,
B101
B111
.
McCallum
,
W. C.
,
Farmer
,
S. F.
, &
Pocock
,
P. V.
(
1984
).
The effects of physical and semantic incongruities on auditory event-related potentials.
Electroencephalography and Clinical Neurophysiology
,
59
,
477
488
.
McClelland
,
J. L.
,
McNaughton
,
B. L.
, &
O'Reilly
,
R. C.
(
1995
).
Why there are complementary learning systems in the hippocampus and neo-cortex: Insights from the successes and failures of connectionists models of learning and memory.
Psychological Review
,
102
,
419
457
.
McLennan
,
C. T.
,
Luce
,
P. A.
, &
Charles-Luce
,
J.
(
2003
).
Representation of lexical form.
Journal of Experimental Psychology: Learning, Memory, and Cognition
,
29
,
539
553
.
McQueen
,
J.
,
Cutler
,
A.
, &
Norris
,
D.
(
2006
).
Phonological abstraction in the mental lexicon.
Cognitive Science
,
30
,
1113
1126
.
Mirman
,
D.
,
McClelland
,
J. L.
, &
Holt
,
L. L.
(
2006
).
An interactive Hebbian account of lexically guided tuning of speech perception.
Psychonomic Bulletin & Review
,
13
,
958
965
.
Munro
,
M.
,
Derwing
,
T.
, &
Flege
,
J.
(
1999
).
Canadians in Alabama: A perceptual study of dialect acquisition in adults.
Journal of Phonetics
,
27
,
385
403
.
Näätänen
,
R.
,
Lehtokoski
,
A.
,
Lennes
,
M.
,
Cheour
,
M.
,
Huotilainen
,
M.
,
Iivonen
,
A.
,
et al
(
1997
).
Language-specific phoneme representations revealed by electric and magnetic brain responses.
Nature
,
385
,
432
434
.
Näätänen
,
R.
,
Tervaniemi
,
M.
,
Sussman
,
E.
,
Paavilainen
,
P.
, &
Winkler
,
I.
(
2001
).
“Primitive intelligence” in the auditory cortex.
Trends in Neurosciences
,
24
,
283
288
.
Norris
,
D.
,
McQueen
,
J. M.
, &
Cutler
,
A.
(
2003
).
Perceptual learning in speech.
Cognitive Psychology
,
47
,
204
238
.
Pallier
,
C.
,
Bosch
,
L.
, &
Sebastian-Galles
,
N.
(
1997
).
A limit on behavioral plasticity in speech perception.
Cognition
,
64
,
B9
B17
.
Pallier
,
C.
,
Colome
,
A.
, &
Sebastian-Galles
,
N.
(
2001
).
The influence of native-language phonology on lexical access: Exemplar-based versus abstract lexical entries.
Psychological Science
,
12
,
445
449
.
Pallier
,
C.
,
Dupoux
,
E.
, &
Jeannin
,
X.
(
1997
).
EXPE: An expandable programming language for on-line psychological experiments.
Behavior Research Methods, Instruments & Computers
,
29
,
322
327
.
Pallier
,
C.
,
Sebastian-Galles
,
N.
,
Dupoux
,
E.
,
Christophe
,
A.
, &
Mehler
,
J.
(
1998
).
Perceptual adjustment to time-compressed speech: A cross-linguistic study.
Memory & Cognition
,
26
,
844
851
.
Peltola
,
M. S.
,
Kujala
,
T.
,
Tuomainen
,
J.
,
Ek
,
M.
,
Aaltonen
,
O.
, &
Näätänen
,
R.
(
2003
).
Native and foreign vowel discrimination as indexed by the mismatch negativity (MMN) response.
Neuroscience Letters
,
352
,
25
28
.
Rafel
,
J.
(
1980
).
Dades sobre la freqüència de les unitats fonològiques del català.
Estudis Universitaris Catalans
,
24
,
473
496
.
Rafel i Fontanals
,
J.
(
1998
).
Diccionari de freqüències. 3, dades globals [Frequency dictionary. 3, global data].
Barcelona
:
Institut d'Estudis Catalans
.
Remez
,
R. E.
,
Rubin
,
P. E.
,
Berns
,
S. M.
,
Pardo
,
J. S.
, &
Lang
,
J. M.
(
1994
).
On the perceptual organization of speech.
Psychological Review
,
101
,
129
156
.
Rugg
,
M. D.
, &
Nagy
,
M. E.
(
1987
).
Lexical contribution to nonword-repetition effects: Evidence from event-related potentials.
Memory & Cognition
,
15
,
473
481
.
Sebastian-Galles
,
N.
,
Dupoux
,
E.
,
Costa
,
A.
, &
Mehler
,
J.
(
2000
).
Adaptation to time-compressed speech: Phonological determinants.
Perception & Psychophysics
,
62
,
834
842
.
Sebastian-Galles
,
N.
,
Echeverria
,
S.
, &
Bosch
,
L.
(
2005
).
The influence of initial exposure on lexical representation: Comparing early and simultaneous bilinguals.
Journal of Memory and Language
,
52
,
240
255
.
Sebastian-Galles
,
N.
,
Rodriguez-Fornells
,
A.
,
de Diego-Balaguer
,
R.
, &
Diaz
,
B.
(
2006
).
First- and second-language phonological representations in the mental lexicon.
Journal of Cognitive Neuroscience
,
18
,
1277
1291
.
Sebastian-Galles
,
N.
, &
Soto-Faraco
,
S.
(
1999
).
Online processing of native and non-native phonemic contrasts in early bilinguals.
Cognition
,
72
,
111
123
.
Smith
,
M. E.
, &
Halgren
,
E.
(
1987
).
Event-related brain potentials during lexical decision: Effects of repetition, word-frequency, pronounceability, and concreteness.
Electroencephalography and Clinical Neurophysiology, Supplement
,
40
,
417
421
.
Sumner
,
M.
, &
Samuel
,
A. G.
(
2005
).
Perception and representation of regular variation: The case of final /t/.
Journal of Memory and Language
,
52
,
322
338
.