On the basis of previous results showing that music training positively influences different aspects of speech perception and cognition, the aim of this series of experiments was to test the hypothesis that adult professional musicians would learn the meaning of novel words through picture–word associations more efficiently than controls without music training (i.e., fewer errors and faster RTs). We also expected musicians to show faster changes in brain electrical activity than controls, in particular regarding the N400 component that develops with word learning. In line with these hypotheses, musicians outperformed controls in the most difficult semantic task. Moreover, although a frontally distributed N400 component developed in both groups of participants after only a few minutes of novel word learning, in musicians this frontal distribution rapidly shifted to parietal scalp sites, as typically found for the N400 elicited by known words. Finally, musicians showed evidence for better long-term memory for novel words 5 months after the main experimental session. Results are discussed in terms of cascading effects from enhanced perception to memory as well as in terms of multifaceted improvements of cognitive processing due to music training. To our knowledge, this is the first report showing that music training influences semantic aspects of language processing in adults. These results open new perspectives for education in showing that early music training can facilitate later foreign language learning. Moreover, the design used in the present experiment can help to specify the stages of word learning that are impaired in children and adults with word learning difficulties.

The aim of the present experiment was to examine the influence of music training on word learning using both behavioral and electrophysiological measures. On the basis of the evidence reviewed below, we tested the hypothesis that musicians would be more efficient at word learning than nonmusicians and that the differences would be reflected in their pattern of brain waves. There is strong evidence from previous cross-sectional studies comparing adult musicians and nonmusicians that long-term music training promotes brain plasticity (Münte, Altenmüller, & Jäncke, 2002) in modifying the functional (Schneider et al., 2002; Pantev et al., 1998) and structural (Elmer, Hänggi, Meyer, & Jäncke, 2013; Gaser & Schlaug, 2003; Schneider et al., 2002) architecture of the auditory pathway. Results of longitudinal studies, mostly in children, showed that music training can be the cause of the observed effects (François, Chobert, Besson, & Schön, 2013; Strait, Parbery-Clark, O'Connell, & Kraus, 2013; Chobert, François, Velay, & Besson, 2012; Moreno et al., 2011; Hyde et al., 2009; Moreno et al., 2009). Most importantly for the present purposes, there is also evidence that music training improves different aspects of speech processing (for review, see Asaridou & McQueen, 2013; Besson, Chobert, & Marie, 2011; Kraus & Chandrasekaran, 2010). These transfer effects possibly arise because speech and music are auditory signals relying on similar acoustic cues (i.e., duration, frequency, intensity, and timbre) and because they share, at least in part, common neuronal substrates for auditory perception (Peretz, Vuvan, Lagrois, & Armony, 2015; Jäncke, 2009) and for higher-order cognitive processing (Rogalsky, Rong, Saberi, & Hickok, 2011; Patel, 2008; Maess, Koelsch, Gunter, & Friederici, 2001). For instance, music training facilitates the processing of a variety of segmental (Bidelman, Weiss, Moreno, & Alain, 2014; Kühnis, Elmer, & Jäncke, 2014; Elmer, Meyer, & Jäncke, 2012; Chobert, Marie, François, Schön, & Besson, 2011; Musacchia, Sams, Skoe, & Kraus, 2007) and suprasegmental speech attributes (Marie, Delogu, Lampis, Olivetti Belardinelli, & Besson, 2011; Wong & Perrachione, 2007) within native (Schön, Magne, & Besson, 2004) and nonnative languages (Marques, Moreno, Castro, & Besson, 2007). Moreover, both musically trained children (Jentschke & Koelsch, 2009) and adults (Fitzroy & Sanders, 2013) are more sensitive to violations of linguistic and music syntax than participants without music training. Perhaps most importantly, recent results also showed that long-term music training positively improves cognitive functions such as auditory attention (Strait, Slater, O'Connell, & Kraus, 2015), visual attention (Wang, Ossher, & Reuter-Lorenz, 2015), working and verbal memory (George & Coch, 2011; Ho, Cheung, & Chan, 2003), executive functions (Zuk, Benjamin, Kenyon, & Gaab, 2014; Moreno et al., 2011; Pallesen et al., 2010), and general intelligence (Schellenberg, 2004). These findings are not surprising insofar as playing an instrument at a professional level is a multidimensional task that, together with specific motor abilities, requires acute auditory perception and focused attention, code switching between the visual information on the score and the corresponding sounds, as well as the ability to maintain auditory information in short- and long-term memory. Taken together, these results are in line with dynamic models of human cognition (Friederici & Singer, 2015; Hagoort, 2014) positing that language—and possibly music—are processed in interaction with other cognitive functions.

Similar to playing music, word learning is also a multidimensional task requiring both perceptive and higher-order cognitive abilities. Let us take the example of Thai. Thai is a tonal and a quantitative language that mainly comprises monosyllabic words (as Mandarin Chinese and other tonal languages) and in which both tonal (i.e., five tones) and vowel length contrasts are linguistically relevant for understanding word meaning (e.g., /pa1/ low tone with a short vowel means “to find” and /pa:1/ low tone with a long vowel means “forest”; Gandour et al., 2002). Thus, when it comes to learn novel words in Thai, the learner has to focus attention on the acoustic stream to discriminate spectral and temporal phonetic contrasts and to build new phonological representations that can then be associated with lexical meaning by recruiting working, short-term, episodic, and semantic memory processes. Thus, if music skills translate into improved auditory perception and attention together with enhanced working and verbal memory, it should be easier for musicians to learn a language such as Thai.

The ERP method is one of the most eligible methods to capture the fast temporal dynamics of word learning and to examine brain plasticity, as reflected by changes in the amplitude and/or latency of ERP components during learning. Previous results in adults have shown that the N400, a negative-going component that typically develops between 300 and 600 msec poststimulus onset (Kutas & Hillyard, 1980), increases in amplitude when meaningless items acquired meaning. Specifically, results showed N400 enhancements in native English speakers after 14 hr of learning the meaning of novel French words (McLaughlin, Osterhout, & Kim, 2004) and after 45 min of learning the meaning of rare words (e.g., “clowder”; Perfetti, Wlotko, & Hart, 2005). Moreover, if a novel word (Borovsky, Elman, & Kutas, 2012; Borovsky, Kutas, & Elman, 2010; Mestres-Missé, Rodriguez-Fornells, & Münte, 2007) or pseudoword (Batterink & Neville, 2011) is presented in a strongly constrained and meaningful context, even a single exposure can be sufficient to build up initial word representations, an effect referred to as “fast mapping” (Carey, 1978). An incubation-like period and further exposures are then required for consolidation and integration into existing lexical networks (Dumay & Gaskell, 2007). Thus, the N400 is taken as a reliable index of word learning, reflecting the formation of semantic representations.

Note though that the N400 component at the core of the above-mentioned experiments clearly showed a more frontal scalp distribution (Borovsky et al., 2010; Mestres-Missé et al., 2007) than the centroparietal N400 typically elicited by already known words (Kutas, Van Petten, & Besson, 1988). This frontal N400 distribution is compatible with results showing that prefrontal and temporal brain regions are associated with the maintenance of novel information in working or short-term memory and the formation of new associations (Hagoort, 2014) and/or with the initial building-up of word representations in episodic memory (Rodriguez-Fornells, Cunillera, Mestres-Missé, & De Diego-Balaguer, 2009; Wagner et al., 1998).

As mentioned above, most studies of music-to-language transfer effects have focused on segmental, suprasegmental, and syntactic processing levels. On the basis of the results on word learning reviewed above, this study aimed at going a step further so as to determine whether professional music training would also influence the semantic level of processing, most often considered as language-specific (but see Koelsch et al., 2004) by facilitating the learning process of novel word meaning. The general hypothesis was that the optimization of perceptual and cognitive functions in professional musicians would positively influence the speed and quality of word learning as reflected by a behavioral advantage for musicians (i.e., lower error rates [ERRs] and faster RTs). Moreover, based on the ERPs and word learning literature (Borovsky et al., 2010, 2012; Batterink & Neville, 2011; Mestres-Missé et al., 2007; Perfetti et al., 2005; McLaughlin et al., 2004), we expected a frontally distributed N400 component to develop in all participants during the early stages of novel word learning. However, if the perceptual and cognitive computations involved in word learning were facilitated in musicians, the development of the N400 component should be faster in musicians than in controls in the learning phase. By contrast, we expected the N400 to show a centroparietal distribution when novel word learning was consolidated.

To test these general hypotheses, we used an ecologically valid experimental design inspired by Wong and Perrachione (2007) and based on a series of four experiments that comprised several tasks performed during the main experimental session (see Figure 1A–E). First, to further test the hypothesis of improved auditory speech discrimination in musicians compared with controls, participants performed a phonological categorization task at the beginning and at the end of the main experimental session (see Figure 1A and E). On the basis of previous results showing that musicians are advantaged when the discrimination is most difficult (Diamond, 2013; Schön et al., 2004), we expected musicians to outperform controls in identifying phonemic contrasts that are not relevant for lexical discrimination in French. Moreover, based on previous literature reporting that the N100 component reflects encoding of auditory cues in the auditory-related cortex (Kühnis et al., 2014) and is influenced by auditory attention and perceptual learning (Seppänen, Hämäläinen, Pesonen, & Tervaniemi, 2012; Woldorff & Hillyard, 1991), we expected this behavioral advantage to be accompanied by an increased N100 amplitude in musicians.

Figure 1. 

Experimental design. Participants performed a series of tasks in the main experimental session (A–E): First, in the phonological categorization task (A), nine natural Thai monosyllabic words had to be categorized based on voicing (Task 1), vowel length (Task 2), pitch (Task 3), or aspiration contrasts (Task 4). Second, in the word learning phase (B), each word was paired with its respective picture. This phase included two separate blocks of trials. Third, in the matching task (C), the words were presented with one of the pictures, either matching or mismatching the previously learned associations. This task included two separate blocks of trials. Fourth, in the semantic task (D), the words were presented with novel pictures that were either semantically related or unrelated to the novel words. Again, this task included two separate blocks of trials. Fifth, participants did again the four tasks of the phonological categorization task (E). Finally, participants came back 5 months after the main session to perform again the matching and semantic tasks (F).

Figure 1. 

Experimental design. Participants performed a series of tasks in the main experimental session (A–E): First, in the phonological categorization task (A), nine natural Thai monosyllabic words had to be categorized based on voicing (Task 1), vowel length (Task 2), pitch (Task 3), or aspiration contrasts (Task 4). Second, in the word learning phase (B), each word was paired with its respective picture. This phase included two separate blocks of trials. Third, in the matching task (C), the words were presented with one of the pictures, either matching or mismatching the previously learned associations. This task included two separate blocks of trials. Fourth, in the semantic task (D), the words were presented with novel pictures that were either semantically related or unrelated to the novel words. Again, this task included two separate blocks of trials. Fifth, participants did again the four tasks of the phonological categorization task (E). Finally, participants came back 5 months after the main session to perform again the matching and semantic tasks (F).

Close modal

Second, participants were asked to learn the meaning of the novel words through picture–word associations (see Figure 1B), a design that has often been used in word learning experiments in children (Friedrich & Friederici, 2008; Torkildsen et al., 2008) and in adults (Dobel, Lagemann, & Zwitserlood, 2009). No behavioral response was required during this word learning phase, but ERPs were recorded to test the main hypothesis that frontally distributed N400 components would develop in both groups of participants (François et al., 2013; Borovsky et al., 2010; Rodriguez-Fornells et al., 2009) but with faster temporal dynamics in musicians than in controls.

Third, to test for the efficacy of the learning phase, participants performed a matching task and were asked to decide whether a picture–word pair matched or mismatched the previously learned pairs (see Figure 1C). Fourth, an important aspect was to determine whether word learning was specific to the picture–word pairs learned during the word learning phase or whether the meaning of the newly learned words was already integrated into semantic networks so that priming effects generalized to new pictures. To this end, participants performed a semantic task during which novel pictures that had not been seen in the previous tasks were presented. They were asked to decide whether the picture and the word were semantically related or unrelated (see Figure 1D). In both the matching and the semantic tasks and in both groups of participants, we predicted that N400 amplitudes would be larger for mismatch and semantically unrelated words than for match and semantically related words (i.e., the typical N400 effect; Kutas & Hillyard, 1980), thereby showing that participants had learned the meaning of the novel words. Moreover, if the novel words' meanings were already integrated into existing semantic networks at the end of the word learning phase (Borovsky et al., 2012; Batterink & Neville, 2011), we expected the N400 effect (mismatching–matching and unrelated–related words) in the matching and semantic tasks to show the centroparietal distribution typically found for already known words (Kutas et al., 1988). Of main interest was to specify the spatiotemporal dynamics of the N400 effect in musicians and in nonmusicians. Finally, if we were to find that music training influenced word learning, then we expected musical ability to be positively correlated with word learning efficacy, as reflected by behavioral measures and/or the N400 effect in the matching and semantic tasks.

Finally, a subset of the participants was behaviorally retested after 5 months (see Figure 1F) in the matching and semantic tasks to evaluate the maintenance of novel words in long-term memory. It was of interest to determine whether the behavioral advantages of musicians in a variety of cognitive domains, as reviewed above, extend to long-term memory. To the best of our knowledge, this aspect has not yet been investigated.

In summary, this experimental design is relevant for specifying whether music expertise influences the semantic level of speech processing, an issue that, to our knowledge, has not been addressed before. By analyzing ERPs, we aimed at better understanding the dynamics of word learning, how fast semantic processes develop, and whether and how the N400 is influenced by music training. Showing that long-term music training with an early start (as it is most often the case in professional musicians) may facilitate foreign language learning later in life should add evidence to the claim that music training has important societal consequences for education (Besson et al., 2011; Kraus & Chandrasekaran, 2010). Finally, this experimental design is of potential interest for clinical research: using several different tasks that call upon several perceptual and cognitive functions (phonological categorization, formation of picture–word associations, maintaining these associations in short-term and long-term memory and generalization of learning effects) within the same patient may help specify the processing stages that are deficient in adults or children with language learning disorders.

Participants

A total of 30 participants with 15 professional musicians (MUS, eight women) and 15 controls without formal music training (nonmusicians, NM, eight women) but involved in a regular leisure activity (e.g., sports, dance, theater) were paid to participate in the experimental session lasting for 2.5 hr (including the application of the Electrocap, psychometric measurements, and experimental tasks). The two groups did not differ in age (MUS: mean age = 25.1 years, age range = 19–30, SD = 3.9; NM: mean age = 25.7 years, age range = 19–33, SD = 4.8; F(1, 28) = 0.02, p = .68). All participants were native French speakers, had comparable education levels (university degree) and socioeconomic background (criteria of the National Institute of Statistics and Economic Studies; MUS: 4.4; NM: 4.9; F(1, 28) = 1.45, p = .24), and reported no past or current audiological or neurological deficits. MUS practiced their instruments for an average of 17 years (range = 11–24, SD = 4.1) and the musician group included three pianists, two accordionists, four violinists, one cellist, two guitarists, one hornist, one tubist, and one flautist. None of the participants was bilingual, but all spoke English as a second language and most participants (except for 1 MUS and 3 NM) had a rudimentary knowledge of a third language that was neither tonal nor quantitative. The study was conducted in accordance with the Helsinki declaration, and all participants gave their informed consent before enrolling in the experiment.

Screening Measures

Cognitive Ability

Standardized psychometric tests were used to examine short-term and working memory (forward and reverse Digit Span, WISC-IV; Wechsler, 2003), visual attention (NEPSY from Korkman, Kirk, & Kemp, 1998) and nonverbal general intelligence (progressive matrices, PM47; Raven, Corporation, & Lewis, 1962).

Musical Aptitude

Participants performed two musicality tests (adapted from the MBEA battery; Peretz, Champod, & Hyde, 2003) consisting in judging whether pairs of piano melodies were same or different, based either on melodic or on rhythmic information.

Experimental Stimuli

Auditory Stimuli

Nine natural Thai monosyllabic words were selected for the experiment: /ba1/, /pa1/, /pha1/, /ba:1/, /pa:1/, /pha:1/, /ba:0/, /pa:0/, /pha:0/.1 These words varied in vowel duration, with short (/ba1/, /pa1/ and /pha1/; 261 msec on average) and long vowels (/ba:1/, /pa:1/, /pha:1/, /ba:0/, /pa:0/ and /pha:0/; 531 msec on average), and in fundamental frequency, with low-tone (/ba1/, /pa1/, /pha1/, /ba:1/, /pa:1/ and /pha:1/; F0 = 175 Hz on average) and midtone vowels (/ba:0/, /pa:0/ and /pha:0/; F0 = 218 Hz on average). Furthermore, words contained voicing contrasts (/ba1/, /ba:1/ and /ba:0/, VOT = −144 msec vs. /pa1/, /pa:1/ and /pa:0/, VOT = 3 msec) as well as aspiration contrasts (/pa1/, /pa:1/ and /pa:0/, VOT = 3 msec vs. /pha1/, /pha:1/ and /pha:0/, VOT = 77 msec).2 Stimuli were recorded by a female Thai–French bilingual, ensuring that all words were produced naturally. For each word, five versions were digitally recorded to reproduce natural speech variability. Sound pressure level was normalized across all words to a mean level of 70 dB by using Praat software (Boersma & Weenink, 2011).

Visual Stimuli

For the learning phase, nine pictures representing familiar objects (i.e., bear, flower, key, chair, bell, eye, strawberry, train, glass) were selected based on the standardized set of 260 pictures (that are matched for name and image agreement, familiarity, and visual complexity) built by Snodgrass and Vanderwart (1980).3 The same pictures as in the learning phase were then presented in the matching task. For the semantic task, 54 new pictures that the participants had not seen before in the experiment and that were semantically related or unrelated to the meaning of the newly learned words were chosen from the Internet by two of the authors (ED and MB). Students from our university (n = 60; age range = 19–25 years) were asked to rate the semantic relatedness between new and old pictures (that is, those previously presented during the word learning phase). Half of the presented pairs were semantically related and the other half were semantically unrelated, and this was confirmed by the students' ratings.

Experimental Tasks

Participants were tested individually in a quiet experimental room (i.e., Faraday cage), where they sat in a comfortable chair at about 1 m from a computer screen. Auditory stimuli were presented through HiFi headphones (HD590, Sennheiser Electronic GmBH, Wedemark, Germany) at 70-dB sound pressure level. Visual and auditory stimuli presentation as well as the collection of behavioral data were controlled by Presentation software (Version 11.0, Neurobehavioral Systems, Berkeley CA).

Main Experimental Session (See Figure 1A–E)

Phonological categorization task

At the beginning and at the end of the experiment, participants performed four different phonological tasks that lasted for 2.3 min each. All nine Thai monosyllabic words were presented in each task, but participants were asked to categorize them based upon different features in each task: (1) the voicing contrast (e.g., /ba1/ vs. /pa1/), (2) the vowel length (e.g., short: /ba1/ vs. long /ba:1/), (3) pitch (e.g., low: /pa:1/ vs. high: /ba:0/), and (4) the aspiration contrast (e.g., /pa1/ vs. /pha1/; see Figure 1A and E). For each task, the contrast was visually represented on the left (e.g., “short” with a short line) and right (e.g., “long” with a long line) half of the screen and participants had to press one of two response buttons according to the correct side (e.g., left one for short and right one for long vowels), as quickly and accurately as possible. Each word was presented 10 times in a pseudorandomized order with the constraints of no immediate repetition of the same word and no more than four successive same responses. Task order and response side were counterbalanced across participants.

Word learning phase

Participants were asked to learn the meaning of each word previously presented in the phonological categorization task using picture–word associations. For instance, a drawing of a bear was followed by the auditory presentation of the word /ba1/, and thus, /ba1/ was the word for bear in our “foreign” language (see Figure 1B). Each of the nine picture–word pairs was presented 20 times, resulting in 180 trials that were pseudorandomly presented (i.e., no immediate repetition of the same association) in two blocks of 3 min each. The picture was presented first and then followed after 750 msec by one of the nine words. Total trial duration was 2000 msec. Two different lists were built, so that across participants different pictures were associated with different words. No behavioral response was required from the participants but they were told that subsequent tests would evaluate whether they learned the meaning of the novel words.

Matching task

One of the nine pictures was presented, followed after 750 msec by an auditory word that matched or mismatched the associations previously learned in the word learning phase. For instance, whereas the drawing of a bear followed by /ba1/ (i.e., bear) was a match, the drawing of a strawberry followed by /ba1/ was a mismatch (see Figure 1C). Participants were asked to press one of two response keys accordingly, as quickly and accurately as possible. Response hand was counterbalanced across participants. At the end of the trial, a row of XXXX appeared on the screen, and participants were asked to blink during this time period (1000 msec; total trial duration: 3750 msec) to minimize eye movement artifacts during word presentation. Each word was presented 20 times, half in match condition and half in mismatch condition. The total of 180 trials was pseudorandomly presented (i.e., no immediate repetition of the same association and no more than four successive same responses) within two blocks of 5.6 min each.

Semantic task

One of the new pictures was presented, followed after 1500 msec by an auditory word that was semantically related or unrelated. For instance, although the picture of a lock was semantically related to the previously learned word /pa:1/ (i.e., “key”), the picture of a strawberry cake was semantically unrelated to /pa:1/ (see Figure 1D). Participants were asked to press one of two response keys accordingly, as quickly and accurately as possible. A familiarization task including four trials was administrated before starting the task. Response hand was counterbalanced across participants. At the end of the trial, a row of XXXX appeared on the screen, and participants were asked to blink during this time period (1000 msec; total trial duration = 4500 msec). Each word was presented 12 times, but none of the new pictures were repeated, so that on each trial the word was associated with a different related or unrelated picture. Half of the picture–word pairs were semantically related, and half were semantically unrelated. A total of 108 trials was presented pseudorandomly (i.e., no immediate repetition of the same association and no more than four successive same responses) within two blocks of 4 min each.

Long-term Memory Session (See Figure 1F)

To test for long-term memory effects on behavior (i.e., ERRs and RTs), participants performed again the matching (always administered first) and semantic tasks 5 months after the main experimental session (no ERPs were recorded). Because of a dropout rate of 33%, only 10 participants were retested in each group. In the matching task, a total of 270 trials were presented within three blocks. In the semantic task, a total of 216 trials were presented in two blocks (with a short pause within each block).

EEG Data Acquisition

The EEG was continuously recorded at a sampling rate of 512 Hz with a band-pass filter of 0–102.4 Hz by using a Biosemi amplifier system (BioSemi Active 2, Amsterdam, The Netherlands) with 32 active Ag/Cl electrodes (Biosemi Pintype) located at standard positions according to the International 10/20 System (Jasper, 1958). The EOG was recorded from flat-type active electrodes placed 1 cm to the left and right of the external canthi and from an electrode beneath the right eye. Two additional electrodes were placed on the left and right mastoids. Electrode impedance was kept below 5 kΩ. EEG data were analyzed using Brain Vision Analyzer software (Version 1.05.0005 & Version 2.1.0; Brain Products, München, Germany). All data were re-referenced offline to the averaged left and right mastoids, filtered with a bandpass filter from 1 to 30 Hz (slope of 24 dB/oct), and independent component analysis and inverse independent component analysis were used to identify and remove components associated with vertical and horizontal ocular movements. Finally, DC-detrend and removal of artifacts above a gradient criterion of 10 μV/msec or a max–min criterion of 100 μV over the entire epoch were applied automatically. For each participant, ERPs were time-locked to word onset, segmented into 2700 msec epochs, including a 200-msec baseline and averaged within each condition. Individual averages were then averaged together to obtain the grand average across all participants.

Statistical Analyses

ANOVAs were computed using Statistica software (Version 12.0, StatSoft, Inc., Tulsa, OK). For ERRs and RTs, ANOVAs always included Group (MUS vs. NM) as between-subject factor as well as specific factors for each task. As the phonological categorization task was the only task that was performed both at the beginning and at the end of the experiment, Order (preexperiment vs. postexperiment) was included as a within-subject factor together with Task (Voicing vs. Vowel length vs. Pitch vs. Aspiration). The matching and semantic tasks were only performed once but each experiment was divided into two blocks of trials so that factors were Block (1 vs. 2) and Condition (match vs. mismatch or related vs. unrelated). To further increase clarity, factors were again specified at the beginning of each task in the Results section. On the basis of ERRs, four outliers (2 MUS and 2 NM, ±2 SD away from the mean) were excluded from further analyses.

For ERPs, we analyzed the early stages of auditory processing in the phonological categorization task using N100 peak amplitude measures. By contrast, during the word learning phase, as well as in the matching and semantic tasks, we focused on semantic processing and we analyzed the mean amplitude of the N400 component. Effects on the N200 were also analyzed using mean amplitudes. Because ERPs were only analyzed for correct responses and because the ERP traces of the four outliers that were eliminated from behavioral analyses were similar to the grand average in each group, all participants (i.e., 15 MUS and 15 NM) were included in the ERP analyses. ANOVAs always included Group (MUS vs. NM) as a between-subject factor and Laterality (left: F3, C3, P3; midline: Fz, Cz, Pz; right: F4, C4, P4) and Anterior/Posterior (frontal: F3, Fz, F4; central: C3, Cz, C4; parietal: P3, Pz, P4) as within-subject factors, together with specific factors for each task. As for behavior, for the phonological categorization task, these factors were Order (pre vs. post) and Task (Voicing vs. Vowel length vs. Pitch vs. Aspiration). For the matching and semantic tasks, factors were Block (1 vs. 2) and Condition (match vs. mismatch or related vs. unrelated). Post hoc Tukey tests (reducing the probability of Type I errors) were used to determine the origin of significant main effects and interactions. To simplify results presentation, we only report significant results related to our hypotheses (full statistical results can be seen in Table 1). Finally, correlation analyses (Pearson's coefficient) were computed between error rates in the musicality task with error rates or N400 effects in the semantic task. General linear models (including Group as a categorical factor, error rates in the musicality task as a continuous factor, and error rates or N400 effects in the semantic task as a dependent factor) were used to test whether the differences between the slopes and intercepts of the two groups were significant.

Table 1. 

Results of ANOVAs on the ERPs Data in Different Tasks of the Main Experimental Session

Word Learning PhaseMatching TaskSemantic Task
ALLMUSNMALLMUSNMALLMUSNM
FpFpFpFpFpFpFpFpFp
N400 (340–550 msec) 
2.72 .11 – – – – 3.85 .06 – – – – 0.93 .34 – – – – 
G × B 0.12 .73 – – – – 3.23 .08 – – – – 0.02 .89 – – – – 
G × L 0.41 .67 – – – – 0.21 .82 – – – – 0.21 .81 – – – – 
G × R 0.13 .88 – – – – 0.75 .48 – – – –   – – – – 
G × B × L   – – – – 0.07 .93 – – – – 0.37 .69 – – – – 
G × B × R 1.87 .16 – – – – 2.40 .10 – – – – 0.01 .99 – – – – 
G × L × R 0.67 .62 – – – – 1.27 .29 – – – – 2.23 .07 – – – – 
G × B × L × R 0.82 .52 – – – – 0.40 .81 – – – – 0.60 .67 – – – – 
G × C – – – – – – 1.61 .21 – – – – 1.06 .31 – – – – 
G × B × C – – – – – – 0.42 .52 – – – – 0.84 .37 – – – – 
G × C × L – – – – – – 1.25 .29 – – – – 0.21 .81 – – – – 
G × C × R – – – – – –   – – – – 0.04 .96 – – – – 
G × B × C × L – – – – – – 0.83 .44 – – – – 0.01 .99 – – – – 
G × B × C × R – – – – – – 1.63 .21 – – – – 0.15 .86 – – – – 
G × C × L × R – – – – – – 1.82 .13 – – – – 1.19 .32 – – – – 
G × B × C × L × R – – – – – – 1.97 .43 – – – – 0.52 .72 – – – – 
  1.57 .23 2.72 .12 3.13 .09 <.001 .99 7.68 .02 1.69 .20 0.63 .44 1.11 .31 
B × L     0.10 .91 0.52 .60 0.28 .76 0.31 .74 0.81 .45 0.25 .78 0.91 .42 
B × R 1.32 .27   0.12 .89 1.46 .24 0.08 .93 5.08 .01 0.36 .70 0.10 .90 0.32 .73 
B × L × R 0.42 .80 0.68 .61 0.54 .71 1.06 .38 0.84 .50 0.54 .71 0.65 .63 0.71 .59 0.51 .73 
– – – – – – 0.01 .95 0.53 .48 1.38 .26 0.10 .75 0.65 .43 0.42 .53 
C × B – – – – – – 1.17 .29 0.07 .79 2.06 .17 1.46 .24 0.04 .85 2.43 .14 
C × L – – – – – –  <.001     1.41 .25 0.19 .83 2.25 .12 
C × R – – – – – –             
C × B × L – – – – – – 0.47 .63 1.02 .38 0.39 .68 0.45 .64 0.43 .65 0.13 .88 
C × B × R – – – – – – 2.43 .10 2.99 .07 0.09 .91 1.12 .33 1.36 .27 0.18 .84 
C × B × L × R – – – – – – 1.29 .28 1.11 .36 1.15 .34 0.59 .67 0.57 .69 0.54 .71 
 
N200 (230–340 msec) 
1.24 .28 – – – –   – – – – 1.49 .23 – – – – 
G × B 0.13 .73 – – – – 0.04 .84 – – – –   – – – – 
G × L 0.60 .55 – – – – 0.27 .77 – – – – 2.41 .10 – – – – 
G × R 0.04 .96 – – – – 0.87 .43 – – – – 2.56 .09 – – – – 
G × B × L 1.05 .36 – – – – 0.03 .97 – – – – 0.29 .75 – – – – 
G × B × R 0.70 .50 – – – – 0.69 .51 – – – – 0.85 .43 – – – – 
G × L × R 0.99 .42 – – – – 1.55 .19 – – – – 1.76 .14 – – – – 
G × B × L × R 0.68 .61 – – – – 0.20 .94 – – – – 0.69 .60 – – – – 
G × C – – – – – – 2.19 .15 – – – – 1.49 .23 – – – – 
G × B × C – – – – – – 0.28 .60 – – – – 0.004 .95 – – – – 
G × C × L – – – – – – 1.85 .17 – – – – 0.64 .53 – – – – 
G × C × R – – – – – – 1.41 .25 – – – – 0.88 .42 – – – – 
G × B × C × L – – – – – – 0.14 .87 – – – – 0.37 .70 – – – – 
G × B × C × R – – – – – – 0.31 .74 – – – – 0.50 .61 – – – – 
G × C × L × R – – – – – – 1.51 .21 – – – – 2.19 .08 – – – – 
G × B × C × L × R – – – – – – 1.99 .10 – – – – 0.69 .60 – – – – 
      2.28 .14 0.81 .38 1.55 .23 0.19 .67   2.35 .15 
B × L     0.82 .45 1.62 .21 0.77 .47 0.90 .42 1.24 .30 0.78 .47 0.75 .48 
B × R 0.93 .40 1.25 .30 0.47 .63 0.30 .74 0.20 .82 1.33 .28 0.26 .77 0.58 .57 0.52 .60 
B × L × R 0.51 .73 0.26 .91 1.33 .27 0.71 .58 0.37 .83 0.62 .65 0.41 .80 1.40 .25 0.09 .99 
– – – – – –     3.38 .09     0.37 .55 
C × B – – – – – – 0.72 .40 0.93 .35 0.05 .82   2.00 .18 2.33 .15 
C × L – – – – – – 3.02 .06 4.04 .03 1.18 .32 1.59 .21 1.60 .22 0.22 .80 
C × R – – – – – –           1.89 .17 
C × B × L – – – – – – 0.21 .81 0.20 .82 0.16 .85 0.28 .76 0.57 .57 0.10 .90 
C × B × R – – – – – – 1.34 .27 1.13 .34 0.26 .77     1.17 .33 
C × B × L × R – – – – – –     1.88 .13 1.53 .20 1.24 .30 0.94 .45 
Word Learning PhaseMatching TaskSemantic Task
ALLMUSNMALLMUSNMALLMUSNM
FpFpFpFpFpFpFpFpFp
N400 (340–550 msec) 
2.72 .11 – – – – 3.85 .06 – – – – 0.93 .34 – – – – 
G × B 0.12 .73 – – – – 3.23 .08 – – – – 0.02 .89 – – – – 
G × L 0.41 .67 – – – – 0.21 .82 – – – – 0.21 .81 – – – – 
G × R 0.13 .88 – – – – 0.75 .48 – – – –   – – – – 
G × B × L   – – – – 0.07 .93 – – – – 0.37 .69 – – – – 
G × B × R 1.87 .16 – – – – 2.40 .10 – – – – 0.01 .99 – – – – 
G × L × R 0.67 .62 – – – – 1.27 .29 – – – – 2.23 .07 – – – – 
G × B × L × R 0.82 .52 – – – – 0.40 .81 – – – – 0.60 .67 – – – – 
G × C – – – – – – 1.61 .21 – – – – 1.06 .31 – – – – 
G × B × C – – – – – – 0.42 .52 – – – – 0.84 .37 – – – – 
G × C × L – – – – – – 1.25 .29 – – – – 0.21 .81 – – – – 
G × C × R – – – – – –   – – – – 0.04 .96 – – – – 
G × B × C × L – – – – – – 0.83 .44 – – – – 0.01 .99 – – – – 
G × B × C × R – – – – – – 1.63 .21 – – – – 0.15 .86 – – – – 
G × C × L × R – – – – – – 1.82 .13 – – – – 1.19 .32 – – – – 
G × B × C × L × R – – – – – – 1.97 .43 – – – – 0.52 .72 – – – – 
  1.57 .23 2.72 .12 3.13 .09 <.001 .99 7.68 .02 1.69 .20 0.63 .44 1.11 .31 
B × L     0.10 .91 0.52 .60 0.28 .76 0.31 .74 0.81 .45 0.25 .78 0.91 .42 
B × R 1.32 .27   0.12 .89 1.46 .24 0.08 .93 5.08 .01 0.36 .70 0.10 .90 0.32 .73 
B × L × R 0.42 .80 0.68 .61 0.54 .71 1.06 .38 0.84 .50 0.54 .71 0.65 .63 0.71 .59 0.51 .73 
– – – – – – 0.01 .95 0.53 .48 1.38 .26 0.10 .75 0.65 .43 0.42 .53 
C × B – – – – – – 1.17 .29 0.07 .79 2.06 .17 1.46 .24 0.04 .85 2.43 .14 
C × L – – – – – –  <.001     1.41 .25 0.19 .83 2.25 .12 
C × R – – – – – –             
C × B × L – – – – – – 0.47 .63 1.02 .38 0.39 .68 0.45 .64 0.43 .65 0.13 .88 
C × B × R – – – – – – 2.43 .10 2.99 .07 0.09 .91 1.12 .33 1.36 .27 0.18 .84 
C × B × L × R – – – – – – 1.29 .28 1.11 .36 1.15 .34 0.59 .67 0.57 .69 0.54 .71 
 
N200 (230–340 msec) 
1.24 .28 – – – –   – – – – 1.49 .23 – – – – 
G × B 0.13 .73 – – – – 0.04 .84 – – – –   – – – – 
G × L 0.60 .55 – – – – 0.27 .77 – – – – 2.41 .10 – – – – 
G × R 0.04 .96 – – – – 0.87 .43 – – – – 2.56 .09 – – – – 
G × B × L 1.05 .36 – – – – 0.03 .97 – – – – 0.29 .75 – – – – 
G × B × R 0.70 .50 – – – – 0.69 .51 – – – – 0.85 .43 – – – – 
G × L × R 0.99 .42 – – – – 1.55 .19 – – – – 1.76 .14 – – – – 
G × B × L × R 0.68 .61 – – – – 0.20 .94 – – – – 0.69 .60 – – – – 
G × C – – – – – – 2.19 .15 – – – – 1.49 .23 – – – – 
G × B × C – – – – – – 0.28 .60 – – – – 0.004 .95 – – – – 
G × C × L – – – – – – 1.85 .17 – – – – 0.64 .53 – – – – 
G × C × R – – – – – – 1.41 .25 – – – – 0.88 .42 – – – – 
G × B × C × L – – – – – – 0.14 .87 – – – – 0.37 .70 – – – – 
G × B × C × R – – – – – – 0.31 .74 – – – – 0.50 .61 – – – – 
G × C × L × R – – – – – – 1.51 .21 – – – – 2.19 .08 – – – – 
G × B × C × L × R – – – – – – 1.99 .10 – – – – 0.69 .60 – – – – 
      2.28 .14 0.81 .38 1.55 .23 0.19 .67   2.35 .15 
B × L     0.82 .45 1.62 .21 0.77 .47 0.90 .42 1.24 .30 0.78 .47 0.75 .48 
B × R 0.93 .40 1.25 .30 0.47 .63 0.30 .74 0.20 .82 1.33 .28 0.26 .77 0.58 .57 0.52 .60 
B × L × R 0.51 .73 0.26 .91 1.33 .27 0.71 .58 0.37 .83 0.62 .65 0.41 .80 1.40 .25 0.09 .99 
– – – – – –     3.38 .09     0.37 .55 
C × B – – – – – – 0.72 .40 0.93 .35 0.05 .82   2.00 .18 2.33 .15 
C × L – – – – – – 3.02 .06 4.04 .03 1.18 .32 1.59 .21 1.60 .22 0.22 .80 
C × R – – – – – –           1.89 .17 
C × B × L – – – – – – 0.21 .81 0.20 .82 0.16 .85 0.28 .76 0.57 .57 0.10 .90 
C × B × R – – – – – – 1.34 .27 1.13 .34 0.26 .77     1.17 .33 
C × B × L × R – – – – – –     1.88 .13 1.53 .20 1.24 .30 0.94 .45 

Although the Condition × Anterior/posterior interactions are significant in musicians (MUS) and in nonmusicians (NM) for the N400 component, the effects are reversed in both groups (typical N400 effect over parietal sites in MUS, inversed N400 effect over frontal sites in NM). Significant effects are printed in italics, and exact levels of significance are indicated except when the p values are inferior to .001 (<.001). G = Group; B = Block; C = Condition; L = Laterality; R = Anterior/Posterior.

Results are presented first for the screening measures of cognitive ability and musical aptitude and second for the experimental tasks. For each experimental task, behavioral data are presented first followed by the ERPs data, except for the word learning phase in which no behavioral data were recorded and for the long-term memory tasks in which no ERPs were recorded. Finally, for ERPs data (except in the phonological categorization task where the N100 component is of main interest), analysis of the N400 component is presented first, followed by analyses of the N200.

Screening Measures

Cognitive Ability

Psychometric data were evaluated by means of univariate ANOVAs. Results showed no significant Group differences regarding general reasoning abilities (i.e., progressive matrices, PM47; F(1, 28) = 1.37, p = .25), verbal working memory (reverse digit span; F(1, 28) = 2.88, p = .10), nor visual attention (F(1, 28) = 3.17, p = .09). By contrast, MUS (mean = 7.6, SD = 0.30) showed better short-term memory abilities than NM (mean = 6.7, SD = 0.30; direct digit span; F(1, 28) = 5.53, p = .03).

Musical Aptitude

A 2 × 2 ANOVA (i.e., 2 Groups × 2 Tasks) showed that MUS made fewer errors (6.7%, SD = 2.0) than NM (17.6%, SD = 2.0; main effect of Group: F(1, 28) = 14.71, p < .001), and all participants performed better on the rhythmic (9.8%, SD = 1.6) than on the melodic task (14.4%, SD = 2.0; main effect of Task: F(1, 28) = 4.19, p = .05) with no Group × Task interaction.

Experimental Tasks

Phonological Categorization Task

Behavioral data

Results of 2 × 2 × 4 ANOVAs [i.e., 2 Groups (MUS vs. NM) × 2 Orders (pre vs. post) × 4 Tasks (Voicing vs. Vowel length vs. Pitch vs. Aspiration)] showed that MUS (6.6%, SD = 2.2) made overall fewer errors compared with NM (19.1%, SD = 2.2; main effect of Group: F(1, 24) = 16.29, p < .001; Figure 2A). The influence of music training was largest in the pitch (MUS: 4.1%, SD = 4.8; NM: 25.8%, SD = 4.9; Tukey, p < .001) and aspiration tasks (MUS: 7.6%, SD = 5.5; NM: 28.7%, SD = 5.5; Tukey, p < .001; Group × Task interaction: F(3, 72) = 11.82, p < .001). Finally, only NM improved their level of performance from pre to post in the pitch task (Group × Task × Order interaction: F(3, 72) = 3.31, p = .03; NM: pre: 30.3%, SD = 4.1, post: 21.3%, SD = 3.3, Tukey, p = .02; MUS: pre: 4.5%, SD = 4.1, post: 3.7%, SD = 3.3, Tukey, p = .99). Analyses of RTs did not reveal significant Group differences (Figure 2B).

Figure 2. 

Percentages of errors (ERRs) and RTs in the different tasks are shown for musicians (MUS) in red and for nonmusician controls (NM) in black. For the phonological categorization task (A and B), results are illustrated for the premeasurement (beginning) and postmeasurement (end of main experimental session), separately for each task (voicing, vowel length, pitch, and aspiration). For the matching task (C and D), results for Match (solid lines) and Mismatch (MM: dotted lines) words are illustrated in the two blocks of the main experimental session and in the three blocks of the long-term memory session. For the semantic task (E and F), results for semantically Related (solid lines) and Unrelated (dotted lines) words are illustrated in the two blocks of the main experimental session and of the long-term memory session.

Figure 2. 

Percentages of errors (ERRs) and RTs in the different tasks are shown for musicians (MUS) in red and for nonmusician controls (NM) in black. For the phonological categorization task (A and B), results are illustrated for the premeasurement (beginning) and postmeasurement (end of main experimental session), separately for each task (voicing, vowel length, pitch, and aspiration). For the matching task (C and D), results for Match (solid lines) and Mismatch (MM: dotted lines) words are illustrated in the two blocks of the main experimental session and in the three blocks of the long-term memory session. For the semantic task (E and F), results for semantically Related (solid lines) and Unrelated (dotted lines) words are illustrated in the two blocks of the main experimental session and of the long-term memory session.

Close modal
Electrophysiological data

N100 amplitudes were evaluated by means of a 2 × 2 × 4 × 3 × 3 ANOVA (i.e., 2 Groups × 2 Orders × 4 Tasks × 3 Laterality positions [left vs. midline vs. right] × 3 Anterior/Posterior positions [frontal vs. central vs. parietal]). Results revealed a significant Group × Task × Laterality interaction effect (F(6, 168) = 3.19, p = .005). Separate ANOVAs for each group showed that only MUS were characterized by a larger N100 in the aspiration task (MUS: −6.50 μV, SD = 2.59 and NM: −5.57 μV, SD = 1.81) compared with the other three tasks over the left hemisphere and at midline electrodes (Task × Laterality interaction: MUS: −5.47 μV, SD = 1.78; F(6, 84) = 3.13, p = .008, Tukey, both ps < .001; NM: −5.01 μV, SD = 1.74; F < 1; Figure 3A and B).

Figure 3. 

Phonological categorization. (A) N100 components at the Central (Cz) electrode are compared between tasks for musicians (MUS) and for nonmusician controls (NM). In this and subsequent figures, time in milliseconds is in abscissa and the amplitude of the effects in microvolt is in ordinate. Time zero corresponds to word onset and negativity is plotted upwards. Latency windows for statistical analyses are indicated with gray dotted lines and the level of significance is represented by stars with *p < .05, **p < .01, and ***p < .001 (red stars for MUS and black stars for NM). (B) Topographic voltage distribution maps illustrate the N100s to the words separately for each task and for MUS and NM. Voltage values are scaled from −8 to +8 μV.

Figure 3. 

Phonological categorization. (A) N100 components at the Central (Cz) electrode are compared between tasks for musicians (MUS) and for nonmusician controls (NM). In this and subsequent figures, time in milliseconds is in abscissa and the amplitude of the effects in microvolt is in ordinate. Time zero corresponds to word onset and negativity is plotted upwards. Latency windows for statistical analyses are indicated with gray dotted lines and the level of significance is represented by stars with *p < .05, **p < .01, and ***p < .001 (red stars for MUS and black stars for NM). (B) Topographic voltage distribution maps illustrate the N100s to the words separately for each task and for MUS and NM. Voltage values are scaled from −8 to +8 μV.

Close modal

Word Learning Phase

Electrophysiological data

The N400 as well as the N200 were evaluated by means of 2 × 2 × 3 × 3 ANOVAs (i.e., 2 Groups × 2 Blocks [1 vs. 2] × 3 Laterality × 3 Anterior/Posterior positions).

For all participants and in line with previous results, the N400 component was larger over frontal (−2.63 μV, SD = 0.87) and central (−2.52 μV, SD = 0.76) sites compared with parietal sites (−1.29 μV, SD = 0.66; Tukey, both ps < .001; main effect of Anterior/Posterior: F(2, 56) = 25.25, p < .001). In addition, the Group × Block × Laterality interaction effect was significant (F(2, 56) = 4.65, p = .01). Separate group analyses showed that only MUS showed significantly increased amplitudes from Block 1 to Block 2. This effect was localized over the left hemisphere and midline electrodes (MUS: Block 1: −1.54 μV, SD = 0.76 and Block 2: −2.16 μV, SD = 0.79; Block × Laterality interaction: F(2, 28) = 16.38, p < .001; Tukey, both ps < .001 and NM: Block 1: −2.34 μV, SD = 1.36 and Block 2: −2.91 μV, SD = 1.38; main effect of Block: F(1, 14) = 2.72, p = .12; Figure 4A and B).

Figure 4. 

Word learning phase. (A) ERPs recorded at left parietal (P3), central (Cz), and right frontal sites (F4) in Block 1 (solid line) and Block 2 (dotted line) are overlapped for all participants (ALL: black lines, top), and separately below for musicians (MUS: red lines) and nonmusician controls (NM: black lines). (B) Topographic voltage distribution maps of the differences between the two blocks (Block 2 minus Block 1) are illustrated for the ERP components of interest (N200, N400), separately for MUS and for NM. Voltage values are scaled from −1.0 to +1.0 μV.

Figure 4. 

Word learning phase. (A) ERPs recorded at left parietal (P3), central (Cz), and right frontal sites (F4) in Block 1 (solid line) and Block 2 (dotted line) are overlapped for all participants (ALL: black lines, top), and separately below for musicians (MUS: red lines) and nonmusician controls (NM: black lines). (B) Topographic voltage distribution maps of the differences between the two blocks (Block 2 minus Block 1) are illustrated for the ERP components of interest (N200, N400), separately for MUS and for NM. Voltage values are scaled from −1.0 to +1.0 μV.

Close modal

Analyses of the N200 component did not reveal significant Group differences (main effect of Group: F(1, 28) = 1.24, p = .28), but all participants showed significantly increased amplitudes from Block 1 (−1.13 μV, SD = 1.22) to Block 2 (−1.79 μV, SD = 0.99; main effect of Block: F(1, 28) = 11.61, p = .002; Figure 4A and B).

Matching Task

Behavioral data

Results of three-way ANOVAs (i.e., 2 Groups × 2 Blocks × 2 Conditions [match vs. mismatch]) showed that ERRs did not significantly differ between the two groups (main effect of Group: F(1, 24) = 2.19, p = .15). However, all participants committed overall fewer errors for match (14.2%, SD = 2.6) compared with mismatch words (19.4%, SD = 2.2) and fewer errors in Block 2 (15.2%, SD = 2.1) than in Block 1 (18.5%, SD = 2.2; main effect of Condition: F(1, 24) = 7.68, p = .01; main effect of Block: F(1, 24) = 9.27 p = .006; Figure 2C). In line with ERRs, analyses of RTs did not reveal between-group differences (main effect of Group: F < 1), but overall faster RTs for match (1041 msec, SD = 64) than for mismatch words (1080 msec, SD = 70; main effect of Condition: F(1, 24) = 5.90, p = .02) and faster RTs in Block 2 (994 msec, SD = 61) than in Block 1 (1128 msec, SD = 73; main effect of Block: F(1, 24) = 60.45, p < .001; Figure 2D).

Electrophysiological data

The N400 as well as the N200 component were evaluated by means of 2 × 2 × 2 × 3 × 3 ANOVAs (2 Groups × 2 Blocks × 2 Conditions × 3 Laterality × 3 Anterior/Posterior positions).

Analysis of the N400 revealed a significant Group × Condition × Anterior/Posterior interaction effect (F(2, 56) = 3.14, p = .05). Results of separate group analyses showed larger N400 amplitudes in MUS for mismatch (−0.10 μV, SD = 1.82) compared with match words over centroparietal regions (0.67 μV, SD = 1.58; Condition × Anterior/Posterior interaction: F(2, 28) = 28.34, p < .001; Tukey, central: p = .02; parietal: p < .001). The opposite pattern was found in NM with larger N400 for match (−1.34 μV, SD = 1.42) than for mismatch words over frontocentral sites (−0.89 μV, SD = 1.53; Condition × Anterior/Posterior interaction: F(2, 28) = 6.38, p = .005; Tukey, frontal: p = .001; central: p = .03; Figure 5A and B).

Figure 5. 

Matching task. (A) Left: ERPs recorded at frontal (Fz) and parietal (Pz) sites are overlapped for Match (solid lines) and Mismatch (dotted lines) words for all participants across the two blocks of trials (ALL: black lines). Central and right: ERPs are presented separately for Block 1 and Block 2 and for musicians (MUS: red lines) and nonmusician controls (NM: black lines). (B) Difference waves (Mismatch minus Match) are overlapped for MUS (red) and NM (black) separately for Block 1 and for Block 2 at Fz and Pz. Topographic voltage distribution maps of the Mismatch minus Match differences are illustrated for N200 and N400 components separately and for MUS and NM in Block 1 and Block 2. Voltage values are scaled from −2.5 to +1.5 μV.

Figure 5. 

Matching task. (A) Left: ERPs recorded at frontal (Fz) and parietal (Pz) sites are overlapped for Match (solid lines) and Mismatch (dotted lines) words for all participants across the two blocks of trials (ALL: black lines). Central and right: ERPs are presented separately for Block 1 and Block 2 and for musicians (MUS: red lines) and nonmusician controls (NM: black lines). (B) Difference waves (Mismatch minus Match) are overlapped for MUS (red) and NM (black) separately for Block 1 and for Block 2 at Fz and Pz. Topographic voltage distribution maps of the Mismatch minus Match differences are illustrated for N200 and N400 components separately and for MUS and NM in Block 1 and Block 2. Voltage values are scaled from −2.5 to +1.5 μV.

Close modal

The N200 amplitude was overall smaller in MUS (−0.15 μV, SD = 0.60) compared with NM (−2.14 μV, SD = 0.60; main effect of Group: F(1, 28) = 5.56, p = .03) and the N200 effect (i.e., mismatch minus match words) was more widely distributed in MUS compared with NM (Group × Block × Condition × Laterality × Anterior/Posterior interaction: F(4, 112) = 1.99, p = .10; Figure 5A and B). MUS showed larger N200 amplitudes for mismatch (P4: −1.98 μV, SD = 0.76) than for match words (P4: 0.01 μV, SD = 0.68) over centroparietal scalp sites with largest differences over midline and right hemisphere (Condition × Anterior/Posterior interaction: F(2, 28) = 15.12, p < .001; Condition × Laterality interaction: F(2, 28) = 4.04, p = .03). In addition, the N200 effect was larger in Block 2 (P4: −2.22 μV, SD = 0.50) than in Block 1 (P4: −1.66 μV, SD = 0.63) over midline and right centroparietal sites (Condition × Block × Laterality × Anterior/Posterior interaction: F(4, 56) = 3.98, p = .007). NM also showed an N200 effect that was localized over parietal sites (−1.06 μV, SD = 1.41; Condition × Anterior/Posterior interaction: F(2, 28) = 6.12, p = .006).

Semantic Task

Behavioral data

Results of three-way ANOVAs (i.e., 2 Groups × 2 Blocks × 2 Conditions [related vs. unrelated]) showed that MUS (23.6%, SD = 2.0) made overall fewer errors than NM (30.5%, SD = 2.0; main effect of Group: F(1, 24) = 5.82, p = .02), and all participants made fewer errors for unrelated (22.6%, SD = 2.6) than for related words (31.5%, SD = 2.9; main effect of Condition: F(1, 24) = 11.24, p = .003; Figure 2E). Moreover, all participants made fewer errors in Block 2 (30.2%, SD = 2.3) than in Block 1 (23.9%, SD = 2.4; main effect of Block: F(1, 24) = 12.37, p = .002). RTs were faster for related (1210 msec, SD = 72) than for unrelated words (1342 msec, SD = 71; main effect of Condition: F(1, 24) = 41.32, p < .001) and faster in Block 2 (1159 msec, SD = 70) than in Block 1 (1393 msec, SD = 75; main effect of Block: F(1, 24) = 88.92, p < .001), with no between-group differences (main effect of Group: F < 1; Figure 2F).

Electrophysiological data

The N400 as well as the N200 component were evaluated by means of 2 × 2 × 2 × 3 × 3 ANOVAs (i.e., 2 Groups × 2 Blocks × 2 Conditions × 3 Laterality × 3 Anterior/Posterior positions).

N400 analyses revealed a significant Group × Anterior/Posterior interaction effect (F(2, 56) = 3.14, p = .05). As typically reported in the literature (Kutas et al., 1988), the N400 was larger for semantically unrelated (2.17 μV, SD = 1.93) compared with related words (3.29 μV, SD = 1.66) over parietal sites in MUS (Condition × Anterior/Posterior interaction: F(2, 28) = 8.98, p < .001; Tukey, parietal: p < .001). By contrast, a reversed N400 effect was found in NM with larger N400 for related (−2.09 μV, SD = 1.60) than for unrelated words (−1.19 μV, SD = 1.06) over frontal sites (Condition × Anterior/Posterior; F(2, 28) = 10.22, p < .001; Tukey, frontal: NM: p = .002, Figure 6A and B).

Figure 6. 

Semantic task. (A) Left: ERPs recorded at frontal (Fz) and parietal sites (Pz) are overlapped for semantically Related (solid lines) and Unrelated (dotted lines) words for all participants across the two blocks of trials (ALL: black lines). Central and right panels: ERPs are presented separately for Block 1 and Block 2 and for musicians (MUS: red lines) and nonmusician controls (NM: black lines). (B) Difference waves (Unrelated minus Related) are overlapped for MUS (red) and NM (black) separately for Block 1 and Block 2 at Fz and Pz. Topographic voltage distribution maps of the Unrelated minus Related differences are illustrated for N200 and N400 components separately and for MUS and NM in Block 1 and Block 2. Voltage values are scaled from −2.5 to +1.0 μV.

Figure 6. 

Semantic task. (A) Left: ERPs recorded at frontal (Fz) and parietal sites (Pz) are overlapped for semantically Related (solid lines) and Unrelated (dotted lines) words for all participants across the two blocks of trials (ALL: black lines). Central and right panels: ERPs are presented separately for Block 1 and Block 2 and for musicians (MUS: red lines) and nonmusician controls (NM: black lines). (B) Difference waves (Unrelated minus Related) are overlapped for MUS (red) and NM (black) separately for Block 1 and Block 2 at Fz and Pz. Topographic voltage distribution maps of the Unrelated minus Related differences are illustrated for N200 and N400 components separately and for MUS and NM in Block 1 and Block 2. Voltage values are scaled from −2.5 to +1.0 μV.

Close modal

The N200 amplitude was larger in Block 2 than in Block 1 in MUS only (Group × Block: (F(1, 28) = 7.26, p = .01; MUS: Block 1: 1.58 μV, SD = 2.62 and Block 2: 0.89 μV, SD = 2.68; F(1, 14) = 5.35, p = .04 and NM: Block 1: −0.11 μV, SD = 3.15 and Block 2: 0.39 μV, SD = 2.58; F(1, 14) = 2.35, p = .15; Figure 6A and B). In addition, the N200 was also larger for unrelated than for related words in MUS but not in NM (main effect of Condition: MUS: Related: 1.88 μV, SD = 2.67 and Unrelated: 0.59 μV, SD = 3.03; F(1, 14) = 4.87, p = .05 and NM: Related: 0.30 μV, SD = 3.63 and Unrelated: −0.02 μV, SD = 2.23; F < 1; Group × Condition × Laterality × Anterior/Posterior interaction: F(4, 112) = 2.19, p = .08), and this effect was larger over central and parietal sites (Condition × Anterior/Posterior interaction: F(2, 28) = 5.05, p = .01; Tukey, both ps < .001).

Long-term Memory: Matching Task (Behavior)

Results of three-way ANOVAs (i.e., 2 Groups × 3 Blocks × 2 Conditions [match vs. mismatch]) showed that MUS (20.3%, SD = 2.8) made fewer errors compared with NM (28.6%, SD = 2.9; main effect of Group: F(1, 19) = 4.19, p = .05; Figure 2C). Moreover, all participants improved their level of performance from Block 1 (31.4%, SD = 2.6) to Blocks 2 and 3 (22.2%, SD = 3.4 and 19.8%, SD = 3.6, respectively; main effect of Block: F(2, 38) = 26.40, p < .001). No significant between-group differences were found on RTs but in both groups, RTs were faster in Block 3 (965 msec, SD = 67) than in Block 2 (1049 msec, SD = 72) and slowest in Block 1 (1168 msec, SD = 82; main effect of Block: F(2, 38) = 36.48, p < .001; Figure 2D).

Long-term Memory: Semantic Task (Behavior)

Results of three-way ANOVAs (i.e., 2 Groups × 2 Blocks × 2 Conditions [related vs. unrelated]) showed that between-group differences were not significant on ERRs nor on RTs. However, all participants made fewer errors for unrelated (21.5%, SD = 6.0) than for related words (29.7%, SD = 6.3; main effect of Condition; F(1, 19) = 6.60, p = .02) and were faster for related (1055 msec, SD = 106) than for unrelated words (1151 msec, SD = 119; main effect of Condition; F(1, 19) = 14.54, p = .001; Figure 2E and F).

Relationships between Musical Aptitude, Behavioral Data, and Brain Activity

Error rates in the musicality test were correlated with the size of the N400 effects in the semantic task for MUS (r = .74, p = .01) but not for NM (r = .01, p = .97; Figure 7A and B). Moreover, the slopes of the correlations were significantly different for the two groups (main effect of Group: F(1, 26) = 7.36, p = .01) as well as the correlation intercepts (Group × Musicality interaction: F(2, 26) = 6.52, p = .005).

Figure 7. 

Correlation analyses. Correlations between the percentages of error in the musicality test (ERRs Musicality) and the sizes of the N400 effect (Unrelated minus Related) in the semantic task are illustrated for musicians (MUS: red) and for nonmusician controls (NM: black). Dotted lines represent the 95% confidence interval of the correlation line.

Figure 7. 

Correlation analyses. Correlations between the percentages of error in the musicality test (ERRs Musicality) and the sizes of the N400 effect (Unrelated minus Related) in the semantic task are illustrated for musicians (MUS: red) and for nonmusician controls (NM: black). Dotted lines represent the 95% confidence interval of the correlation line.

Close modal

Summary of Results

By using an ecologic valid experimental design we tested the general hypothesis that professional musicians would learn the meaning of novel words more efficiently than control participants. Overall, both behavioral and electrophysiological data support this hypothesis. Behaviorally, musicians performed better than controls in the musicality and phonological categorization tasks. In addition, although all participants performed similarly in the matching task, musicians made significantly fewer errors in the semantic task. Finally, after 5 months, musicians remembered more words than controls as reflected by lower error rates in the matching task.

The electrophysiological markers of word learning also clearly differed between the two groups. Although control participants showed similar N100 amplitudes in all phonological categorization tasks, musicians showed an increase in N100 amplitude when categorizing the most difficult aspiration contrast. Most importantly and in line with the word learning literature, both groups showed enhanced N400 amplitudes over frontal scalp sites when learning the meaning of novel words. However, only musicians were additionally characterized by larger left-lateralized N400s in the second block compared with the first block of the word learning phase. Finally, only musicians showed the typical centroparietal distribution of the N400 effect in the matching and semantic tasks. By contrast, nonmusicians showed reversed N400 effects in both tasks over frontal sites. These findings are discussed in detail below. It is, however, important to note that cross-sectional studies, as the one reported here, are necessary to first demonstrate differences between musically trained and untrained participants before designing longitudinal experiments to test for the causality of the reported effects.

Spatiotemporal Dynamics in the Learning Phase

Results showed clear evidence for fast brain plasticity, as reflected by the rapid development of the N400 in both groups of participants after only 3 min of novel word learning (Block 1), that is after 10 repetitions of each picture–word association (see Figure 4). This finding extends previous results on word learning showing N400 enhancements when learning a second language (McLaughlin et al., 2004), the meaning of rare words (Perfetti et al., 2005), and when learning the meaning of novel words or pseudowords from highly constrained sentence contexts (Borovsky et al., 2010, 2012; Batterink & Neville, 2011; Mestres-Missé et al., 2007). Importantly, and in line with previous work in adults (Borovsky et al., 2012; Mestres-Missé et al., 2007) and in children (François et al., 2013; Friedrich & Friederici, 2008), in both musicians and controls the N400 component to novel words was larger frontocentrally than parietally. These results are compatible with previous findings, suggesting that prefrontal and temporal brain regions are associated with the maintenance of novel information in working memory (Hagoort, 2014) and the acquisition of word meaning (Rodriguez-Fornells et al., 2009).

Importantly, however, the N400 increase from the first to the second block of novel word learning was only significant in musicians (see Figure 4A). In line with our hypothesis, this is taken to suggest faster encoding of novel word meaning in musicians than in controls. In addition, word learning in musicians was accompanied by a rapid shift of the N400 component distribution from frontal to centroparietal sites. This shift in distribution is in line with the hypothesis that musicians have already integrated novel words representations in semantic memory (Batterink & Neville, 2011) and that the N400 centroparietal distribution in musicians reflects access to semantic memory (Kutas & Federmeier, 2011). By contrast, in control participants, the N400 component remained larger frontally throughout the entire learning phase, suggesting that nonmusicians had not yet integrated the novel words' meaning into established semantic networks. This interpretation can be directly tested in future experiments by increasing the number of repetitions of picture–word associations. Under such conditions, a frontal to centroparietal shift in N400 distribution should also be found in nonmusicians.

Evidence for Rapidly Established Representations of Novel Words in the Matching and Semantic Tasks

As expected, all participants, whether musicians or controls, were able to learn the nine picture–word associations, as reflected by error rates below chance level (i.e., 50% errors) in both the matching (mean = 17%) and semantic tasks (mean = 27%; see Figure 2C and E). To follow the dynamics of word learning within each task, results were compared between the first and the second block of trials. Interestingly, the level of performance increased with repetition in both tasks, indicating a still ongoing learning effect. Moreover, both musicians and controls showed clear matching effects with lower error rates and faster RTs for match compared with mismatch words (Boddy & Weinberg, 1981). However, and in contrast to typical semantic priming effects (Meyer & Schvaneveldt, 1971), both groups of participants made more errors for semantically related words than for semantically unrelated words. Although unexpected, this result may reflect a response bias towards rejection (i.e., considering the word as unrelated to the picture) as a consequence of task difficulty generating higher uncertainty (Gigerenzer, 1991). In other words, when participants were not certain whether the pictures and the words were semantically related (e.g., “honey” and “bear”), they tended to respond that they were unrelated. By contrast, results for RTs conform to the literature (i.e., faster RTs for semantically related than for unrelated words; Meyer & Schvaneveldt, 1971). Although the presence of a speed–accuracy trade-off limits the interpretation of this priming effect, faster RTs for semantically related words are indicative that new pictures that had not been seen before in the experiment did activate the representations of semantically related newly learned words.

Turning to the influence of music training, musicians and nonmusician controls performed similarly in the matching task but musicians outperformed controls in the semantic task. This contrastive pattern of results suggests that the two tasks tap into different memory systems. To decide whether the newly learned word matched the picture in the matching task, participants had to retrieve the specific picture–word associations that were stored in episodic memory during the word learning phase. By contrast, in the semantic task participants had to retrieve general information from semantic memory because the novel pictures that were presented before the newly learned words had not been seen before in the experiment. In line with the centroparietal shift in N400 distribution observed at the end of the word learning phase, the finding that musicians outperformed nonmusicians in the semantic task is taken as evidence that musicians had already integrated the novel words' meanings into semantic memory so that priming effects generalized to new pictures.

ERPs in the matching and semantic tasks also add support to this interpretation. In musicians and for both tasks, the N400 over centroparietal sites was larger for unexpected (mismatch/unrelated) than for expected words (match/related; Figures 5 and 6). This sensitivity to word characteristics and this scalp distribution correspond to the N400 component, typically considered as the electrophysiological marker of the integration of novel words' meanings into semantic memory (Borovsky et al., 2012; Batterink & Neville, 2011; Mestres-Missé et al., 2007) and “as reflecting the activity in a multimodal long-term memory system that is induced by a given input stimulus during a delimited time window as meaning is dynamically constructed” (Kutas & Federmeier, 2011, p. 22; see also Steinbeis & Koelsch, 2008, for N400s elicited by single chords incongruous with the preceding context). By contrast, in nonmusician controls, the N400 effect was reversed over frontocentral sites in both tasks, with larger N400 for expected (match/related) than for unexpected words (mismatch/unrelated). This finding was surprising based on previous results showing that the N400 is larger for unrelated than for related words in both lexical decision tasks (Borovsky et al., 2012) and semantic priming experiments (Mestres-Missé et al., 2007). Because the amount of music training was not controlled in these experiments, a possible explanation is that some of the participants had musical skills, hence influencing the results. However, stimulus and task differences are more likely to account for this discrepancy. Most experiments on ERPs and word learning in adults used similar designs with novel words embedded in sentence contexts. Here and as in experiments conducted with children (Friedrich & Friederici, 2008; Torkildsen et al., 2008) and with adults (Dobel et al., 2009), participants learned novel words through picture–word associations. It is thus possible that this learning mode was more demanding than learning the meaning of novel words from sentence contexts. Along these lines, we interpret the reversed N400 effect in nonmusicians in the matching and semantic tasks as showing that nonmusicians had not yet fully learned the picture–word associations (as reflected by the frontal distribution of the N400 in these tasks that is similar to the frontal N400 found during word learning) and that they were still building up new episodic memory traces based on the correct information provided by the matching words (as reflected by larger N400 increase from the first to the second block for match than for mismatch words; see Figure 5A). As mentioned above, these interpretations can be tested by increasing the number of trials in the learning phase as well as in the matching and semantic tasks and by comparing different word learning designs (e.g., novel words in sentence contexts vs. picture–word associations).

Finally, although the main focus of this experiment was on the modulations of N400 amplitude with novel word learning, results also revealed interesting effects on the N200 component. In both groups, the N200 components in the matching task were larger for mismatching than for matching words over parietal sites (see Figure 5). Insofar as the N200 has been associated with categorization processes (Friedrich & Friederici, 2008), mismatching words possibly required more effortful categorization processes than matching words. However, there is also evidence that the N200 component reflects early contextual influences (van den Brink, Brown, & Hagoort, 2001) and phonological processing (Connolly & Phillips, 1994). Thus, mismatching words possibly elicited larger N200 than matching words because they were unexpected based on the picture context and/or at the phonological level. In the semantic task, the increase in N200 amplitude for semantically unrelated compared with related words was only found in musicians, again suggesting that word categorization was possibly more efficient in musicians and/or that musicians were more sensitive to the context (i.e., the picture) or to the phonological characteristics of novel words than nonmusicians.

Evidence for Long-lasting Representations of Novel Words

To our knowledge, this is the first experiment comparing long-term memory (after 5 months) for novel words in musicians and controls. Clearly, results need to be considered with caution because of the high dropout rate. Nevertheless, they point to improved long-term memory in musicians compared with controls in the matching task (lower error rates) that was always performed first. The memory traces of the words that have been learned 5 months before therefore seem stronger in musicians than in nonmusicians. By contrast, no between-group differences were found in the semantic task possibly because both groups of participants similarly benefited from the reactivation of memory traces during the matching task. Taken together, these results point to long-lasting effects of rapidly established word representations during word learning, and they open new perspectives to further test for the influence of music training on long-term memory.

Evidence for Transfer Effects from Music Training to Word Learning

In summary, behavioral and electrophysiological data showed that music training improved several aspects of word learning. How can we account for the influence of music training on novel word learning?

Cascading Effects from Perception to Word Learning

The first interpretation, in terms of cascading effects, is that enhanced auditory perception and auditory attention (Strait et al., 2015) in musicians drive the facilitation observed in word learning through different subsequent steps (i.e., building up new phonological representations and attaching meaning to them, storing this new information in short- and long-term memory). In support of this interpretation, the error rate in the musicality test was correlated with the size of the N400 effect in the semantic task in musicians but not in controls (Figure 7), thereby clearly pointing to a relationship between auditory perception/attention and word learning.

Moreover, in line with the hypothesis of improved speech discrimination in musicians when the task is most difficult (Diamond, 2013; Schön et al., 2004), musicians outperformed control participants in the phonological categorization of tonal and aspiration contrasts, but both groups performed equally well for simple voicing contrasts (/Ba/ vs. /Pa/). Thus, music training was most useful to discriminate phonemes that are contrastive in Thai but that do not belong to the French phonemic repertoire (Figure 2A). It should be noted in this respect that, although vowel length is not contrastive in French, controls nevertheless performed as well as musicians in this task, possibly because the natural difference in vowel duration was large enough (270 msec) to be easily perceived by nonmusician controls. Taken together, these findings add support to the hypothesis of transfer effects between music and speech (Asaridou & McQueen, 2013; Besson et al., 2011), with near transfer for acoustic cues common to music and speech such as frequency and with far transfer to unfamiliar linguistic cues such as aspiration.

Note also that the N100 amplitude in musicians was enhanced for the nonnative aspiration contrast (see Figure 3). Because this task was more difficult for French native speakers than the other tasks, as revealed by behavioral data, the increased N100 may reflect increased focused attention (Strait et al., 2015) and mobilization of neuronal resources in musicians. An alternative but complementary interpretation is that this increase in N100 amplitude reflected increased neural synchronicity and structural connectivity in musicians, who are typically more sensitive to the acoustic-phonetic properties of speech sounds than nonmusicians who showed similar N100 components in all tasks (Bidelman et al., 2014; Elmer et al., 2013; Chobert et al., 2012; Musacchia et al., 2007; Wong & Perrachione, 2007).

Finally, and in line with the cascade hypothesis, it may be that the very nature of the stimuli (monosyllabic Thai words) is at least partly responsible of the musician's advantage in the semantic task. Because musicians are more sensitive than nonmusicians to variations in pitch, duration, and VOT, this possibly helped them build precise phonological representations that were then more easily associated to a novel word meaning. One way to test for this interpretation would be to use a similar design but with written instead of spoken words thereby potentially reducing the musicians' acoustic-phonetic advantage.

Multidimensional Facilitation Effects

The second interpretation, that is by no means contradictory but rather complementary to the cascading interpretation, is that the multidimensional nature of music training independently improved the several functions that are necessary for word learning. There is already some evidence in the literature for enhanced working and verbal memory (George & Coch, 2011; Ho et al., 2003) and executive functions (Zuk et al., 2014; Moreno et al., 2011; Rogalsky et al., 2011; Pallesen et al., 2010) with musicianship. In line with these results, musicians showed enhanced auditory short-term memory (digit span; George & Coch, 2011). Moreover, as reported in previous experiments, we also found significant between-group differences in the ERP component related to semantic integration and memory (N400; Mestres-Missé et al., 2007; Perfetti et al., 2005; McLaughlin et al., 2004). However, in contrast to previous results, we found no evidence for increased working memory (reversed digit span; George & Coch, 2011) or for improved nonverbal intelligence (PM47) in adult musicians. Note that the assessment of general cognitive abilities was quite limited in the present experiment because of time constraints. It is thus possible that general cognitive abilities other than the ones tested here could have influenced the observed between-group differences in word learning. Future experiments will aim at including more tests targeting at elucidating different cognitive functions (selective and sustained attention, working, short-term, and long-term memory, executive functions). Finally, it could also be argued that musicians performed better than nonmusicians because they were generally more motivated (Corrigall, Schellenberg, & Misura, 2013). Although this is difficult to control for, it is important to note that participants were not informed about the aim of the experiment until the end of the session. Only that we were interested in language, music, and the brain. Thus, as they discovered task by task what the experimental session was about, it is unlikely that general motivation accounted for the present behavioral and electrophysiological effects.

Conclusions

In summary, by recording ERPs during both learning and test phases (matching and semantic tasks) of novel word learning, results revealed fast changes in brain activity after only a few minutes of exposure to picture–word associations. Moreover, these changes were stronger in musicians than in controls. Specifically, the frontoparietal shift of the N400 in the word learning phase (i.e., without motor responses) only developed in musicians, which we interpret as an electrophysiological signature of “fast mapping” (Carey, 1978). To our knowledge, this is the first report showing that music training influences semantic processing. As a future step, we plan to use a longitudinal approach with nonmusician controls trained with music so as to test for the causal influence of music training on word learning. Finally, these results also open new perspectives to further investigate the influence of music training on long-term memory for applications of music training in the domain of native and second language learning (Moreno, Lee, Janus, & Bialystok, 2015; Chobert & Besson, 2013) and for using this type of experimental design in clinical research to specify the stages of word learning that are most deficient.

We would like to thank all the participants, Chotiga Pattamadilok for registering the auditory stimuli, and Benjamin Furnari for his help with data acquisition and analyses. The work of E. D. M. Ba., M. D. I., and M. B., carried out within the Labex BLRI (ANR-11-LABX-0036), has benefited from support from the French government, managed by the French National Agency for Research (ANR), under the program “Investissements d'Avenir” (ANR-11-IDEX-0001-02). E. D. was supported by a doctoral fellowship from the BLRI and M. Ba. by a doctoral fellowship from the French Ministry of Research and Education.

Reprint requests should be sent to Eva Dittinger, Laboratoire de Neuroscience Cognitives, Université Aix Marseille, Centre Saint Charles, 3 Place Victor Hugo, Marseille, France, 13331, or via e-mail: [email protected].

1. 

Following phonetic transcription in Thai, 1 refers to low-tone, 0 to midtone, ph to aspirated voicing, and the colon to long vowel duration.

2. 

Voice Onset Time (VOT) is defined as the interval between the noise burst produced at consonant release and the waveform periodicity associated with vocal cord vibrations (Lisker & Abramson, 1967) .

3. 

Pictures were chosen from the Snodgrass and Vanderwart (1980) pictures' set but were found on the Internet to ensure sufficient resolution and quality.

Asaridou
,
S. S.
, &
McQueen
,
J. M.
(
2013
).
Speech and music shape the listening brain: Evidence for shared domain-general mechanisms
.
Frontiers in Psychology
,
4
,
321
.
Batterink
,
L.
, &
Neville
,
H.
(
2011
).
Implicit and explicit mechanisms of word learning in a narrative context: An event-related potential study
.
Journal of Cognitive Neuroscience
,
23
,
3181
3196
.
Besson
,
M.
,
Chobert
,
J.
, &
Marie
,
C.
(
2011
).
Transfer of training between music and speech: Common processing, attention, and memory
.
Frontiers in Psychology
,
2
,
94
.
Bidelman
,
G. M.
,
Weiss
,
M. W.
,
Moreno
,
S.
, &
Alain
,
C.
(
2014
).
Coordinated plasticity in brainstem and auditory cortex contributes to enhanced categorical speech perception in musicians
.
European Journal of Neuroscience
,
40
,
2662
2673
.
Boddy
,
J.
, &
Weinberg
,
H.
(
1981
).
Brain potentials, perceptual mechanism and semantic categorization
.
Biological Psychology
,
12
,
43
61
.
Boersma
,
P.
, &
Weenink
,
D.
(
2011
).
Praat: Doing phonetics by computer [computer program]
.
Retrieved from www.praat.org
.
Borovsky
,
A.
,
Elman
,
J.
, &
Kutas
,
M.
(
2012
).
Once is enough: N400 indexes semantic integration of novel word meanings from a single exposure in context
.
Language Learning and Development
,
8
,
278
302
.
Borovsky
,
A.
,
Kutas
,
M.
, &
Elman
,
J.
(
2010
).
Learning to use words: Event-related potentials index single-shot contextual word learning
.
Cognition
,
116
,
289
296
.
Carey
,
S.
(
1978
).
The child as word learner
. In
M.
Halle
,
J.
Bresnan
, &
G. A.
Miller
(Eds.),
Linguistic theory and psychological reality
(pp.
264
293
).
Cambridge, MA
:
MIT Press
.
Chobert
,
J.
, &
Besson
,
M.
(
2013
).
Musical expertise and second language learning
.
Brain Sciences
,
3
,
923
940
.
Chobert
,
J.
,
François
,
C.
,
Velay
,
J. L.
, &
Besson
,
M.
(
2012
).
Twelve months of active musical training in 8- to 10-year-old children enhances the preattentive processing of syllabic duration and voice onset time
.
Cerebral Cortex
,
24
,
956
967
.
Chobert
,
J.
,
Marie
,
C.
,
François
,
C.
,
Schön
,
D.
, &
Besson
,
M.
(
2011
).
Enhanced passive and active processing of syllables in musician children
.
Journal of Cognitive Neuroscience
,
23
,
3874
3887
.
Connolly
,
J. F.
, &
Phillips
,
N. A.
(
1994
).
Event-related potential components reflect phonological and semantic processing of the terminal words of spoken sentences
.
Journal of Cognitive Neuroscience
,
6
,
256
266
.
Corrigall
,
K. A.
,
Schellenberg
,
E. G.
, &
Misura
,
N. M.
(
2013
).
Music training, cognition, and personality
.
Frontiers in Psychology
,
4
,
222
.
Diamond
,
A.
(
2013
).
Executive functions
.
Annual Review Psychology
,
64
,
135
168
.
Dobel
,
C.
,
Lagemann
,
L.
, &
Zwitserlood
,
P.
(
2009
).
Non-native phonemes in adult word learning: Evidence from the N400m
.
Philosophical Transactions of the Royal Society, Series B, Biological Sciences
,
364
,
3697
3709
.
Dumay
,
N.
, &
Gaskell
,
M. G.
(
2007
).
Sleep-associated changes in the mental representation of spoken words
.
Psychological Science
,
18
,
35
39
.
Elmer
,
S.
,
Hänggi
,
J.
,
Meyer
,
M.
, &
Jäncke
,
L.
(
2013
).
Increased cortical surface area of the left planum temporale in musicians facilitates the categorization of phonetic and temporal speech sounds
.
Cortex
,
49
,
2812
2821
.
Elmer
,
S.
,
Meyer
,
M.
, &
Jäncke
,
L.
(
2012
).
Neurofunctional and behavioral correlates of phonetic and temporal categorization in musically trained and untrained subjects
.
Cerebral Cortex
,
22
,
650
658
.
Fitzroy
,
A. B.
, &
Sanders
,
L. D.
(
2013
).
Musical expertise modulates early processing of syntactic violations in language
.
Frontiers in Psychology
,
3
,
603
.
François
,
C.
,
Chobert
,
J.
,
Besson
,
M.
, &
Schön
,
D.
(
2013
).
Music training for the development of speech segmentation
.
Cerebral Cortex
,
23
,
2038
2043
.
Friederici
,
A. D.
, &
Singer
,
W.
(
2015
).
Grounding language processing on basic neurophysiological principles
.
Trends in Cognitive Sciences
,
19
,
329
338
.
Friedrich
,
M.
, &
Friederici
,
A. D.
(
2008
).
Neurophysiological correlates of online word learning in 14-month-old infants
.
NeuroReport
,
19
,
1757
1761
.
Gandour
,
J.
,
Wong
,
D.
,
Lowe
,
M.
,
Dzemidzic
,
M.
,
Satthamnuwong
,
N.
,
Tong
,
Y.
, et al
(
2002
).
A crosslinguistic fMRI study of spectral and temporal cues underlying phonological processing
.
Journal of Cognitive Neuroscience
,
14
,
1076
1087
.
Gaser
,
C.
, &
Schlaug
,
G.
(
2003
).
Brain structures differ between musicians and nonmusicians
.
Journal of Neuroscience
,
23
,
9240
9245
.
George
,
E. M.
, &
Coch
,
D.
(
2011
).
Music training and working memory: An ERP study
.
Neuropsychologia
,
49
,
1083
1094
.
Gigerenzer
,
G.
(
1991
).
How to make cognitive illusions disappear: Beyond “heuristics and biases”
.
European Review of Social Psychology
,
2
,
83
115
.
Hagoort
,
P.
(
2014
).
Nodes and networks in the neural architecture for language: Broca's region and beyond
.
Current Opinion in Neurobiology
,
28
,
136
141
.
Ho
,
Y. C.
,
Cheung
,
M. C.
, &
Chan
,
A. S.
(
2003
).
Music training improves verbal but not visual memory: Cross-sectional and longitudinal explorations in children
.
Neuropsychology
,
17
,
439
450
.
Hyde
,
K. L.
,
Lerch
,
J.
,
Norton
,
A.
,
Forgeard
,
M.
,
Winner
,
E.
,
Evans
,
A. C.
, et al
(
2009
).
Musical training shapes structural brain development
.
Journal of Neuroscience
,
29
,
3019
3025
.
Jäncke
,
L.
(
2009
).
Music drives brain plasticity
.
F1000 Biology Reports
,
1
,
1
6
.
Jasper
,
H. H.
(
1958
).
The ten–twenty system of the International Federation
.
Clinical Neurophysiology
,
10
,
371
375
.
Jentschke
,
S.
, &
Koelsch
,
S.
(
2009
).
Musical training modulates the development of syntax processing in children
.
Neuroimage
,
47
,
735
744
.
Koelsch
,
S.
,
Kasper
,
E.
,
Sammler
,
D.
,
Schulze
,
K.
,
Gunter
,
T.
, &
Friederici
,
A. D.
(
2004
).
Music, language and meaning: Brain signatures of semantic processing
.
Nature Neuroscience
,
7
,
302
307
.
Korkman
,
M.
,
Kirk
,
U.
, &
Kemp
,
S.
(
1998
).
NEPSY: A developmental neuropsychological assessment
.
San Antonio, TX
:
The Psychological Corporation
.
Kraus
,
N.
, &
Chandrasekaran
,
B.
(
2010
).
Music training for the development of auditory skills
.
Nature Reviews Neuroscience
,
11
,
599
605
.
Kühnis
,
J.
,
Elmer
,
S.
, &
Jäncke
,
L.
(
2014
).
Auditory evoked responses in musicians during passive vowel listening are modulated by functional connectivity between bilateral auditory-related brain regions
.
Journal of Cognitive Neuroscience
,
26
,
2750
2761
.
Kutas
,
M.
, &
Federmeier
,
K. D.
(
2011
).
Thirty years and counting: Finding meaning in the N400 component of the event related brain potential (ERP)
.
Annual Review of Psychology
,
62
,
621
647
.
Kutas
,
M.
, &
Hillyard
,
S. A.
(
1980
).
Reading senseless sentences: Brain potentials reflect semantic incongruity
.
Science
,
207
,
203
205
.
Kutas
,
M.
,
Van Petten
,
C.
, &
Besson
,
M.
(
1988
).
Event-related potential asymmetries during the reading of sentences
.
Electroencephalogry and Clinical Neurophysiology
,
69
,
218
233
.
Lisker
,
L.
, &
Abramson
,
A. S.
(
1967
).
Some effects of context on voice onset time in English stops
.
Language ans Speech
,
10
,
1
28
.
Maess
,
B.
,
Koelsch
,
S.
,
Gunter
,
T. C.
, &
Friederici
,
A. D.
(
2001
).
“Musical syntax” is processed in the area of Broca: An MEG-study
.
Nature Neuroscience
,
4
,
540
545
.
Marie
,
C.
,
Delogu
,
F.
,
Lampis
,
G.
,
Olivetti Belardinelli
,
M.
, &
Besson
,
M.
(
2011
).
Influence of musical expertise on segmental and tonal processing in Mandarin Chinese
.
Journal of Cognitive Neuroscience
,
23
,
2701
2715
.
Marques
,
C.
,
Moreno
,
S.
,
Castro
,
S. L.
, &
Besson
,
M.
(
2007
).
Musicians detect pitch violation in a foreign language better than nonmusicians: Behavioural and electrophysiological evidence
.
Journal of Cognitive Neuroscience
,
19
,
1453
1463
.
McLaughlin
,
J.
,
Osterhout
,
L.
, &
Kim
,
A.
(
2004
).
Neural correlates of second-language word learning: Minimal instruction produces rapid change
.
Nature Neuroscience
,
7
,
703
704
.
Mestres-Missé
,
A.
,
Rodriguez-Fornells
,
A.
, &
Münte
,
T. F.
(
2007
).
Watching the brain during meaning acquisition
.
Cerebral Cortex
,
17
,
1858
1866
.
Meyer
,
D. E.
, &
Schvaneveldt
,
R. W.
(
1971
).
Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations
.
Journal of Experimental Psychology
,
20
,
227
234
.
Moreno
,
S.
,
Bialystok
,
E.
,
Barac
,
R.
,
Schellenberg
,
E. G.
,
Cepeda
,
N. J.
, &
Chau
,
T.
(
2011
).
Short-term music training enhances verbal intelligence and executive function
.
Psychological Science
,
22
,
1425
1433
.
Moreno
,
S.
,
Lee
,
Y.
,
Janus
,
M.
, &
Bialystok
,
E.
(
2015
).
Short-term second language and music training induces lasting functional brain changes in early childhood
.
Child Development
,
86
,
394
406
.
Moreno
,
S.
,
Marques
,
C.
,
Santos
,
A.
,
Santos
,
M.
,
Castro
,
S. L.
, &
Besson
,
M.
(
2009
).
Musical training influences linguistic abilities in 8-year-old children: More evidence for brain plasticity
.
Cerebral Cortex
,
19
,
712
723
.
Münte
,
T. F.
,
Altenmüller
,
E.
, &
Jäncke
,
L.
(
2002
).
The musician's brain as a model of neuroplasticity
.
Nature Reviews Neuroscience
,
3
,
473
478
.
Musacchia
,
G.
,
Sams
,
M.
,
Skoe
,
E.
, &
Kraus
,
N.
(
2007
).
Musicians have enhanced subcortical auditory and audiovisual processing of speech and music
.
Proceedings of the National Academy of Sciences, U.S.A.
,
104
,
15894
15898
.
Pallesen
,
K. J.
,
Brattico
,
E.
,
Bailey
,
C. J.
,
Korvenoja
,
A.
,
Koivisto
,
J.
,
Gjedde
,
A.
, et al
(
2010
).
Cognitive control in auditory working memory is enhanced in musicians
.
PLoS One
,
5
,
11120
.
Pantev
,
C.
,
Oostenveld
,
R.
,
Engelien
,
A.
,
Ross
,
B.
,
Roberts
,
L. E.
, &
Hoke
,
M.
(
1998
).
Increased auditory cortical representation in musicians
.
Nature
,
23
,
811
814
.
Patel
,
A. D.
(
2008
).
Music, language, and the brain
.
New York
:
Oxford University Press
.
Peretz
,
I.
,
Champod
,
A. S.
, &
Hyde
,
K.
(
2003
).
Varieties of musical disorders. The Montreal Battery of Evaluation of Amusia
.
Annals of the New York Academy of Sciences
,
999
,
58
75
.
Peretz
,
I.
,
Vuvan
,
D.
,
Lagrois
,
M.-É.
, &
Armony
,
J. L.
(
2015
).
Neural overlap in processing music and speech
.
Philosophical Transactions of the Royal Society, Series B, Biological Sciences
,
370
,
20140090
.
Perfetti
,
C. A.
,
Wlotko
,
E. W.
, &
Hart
,
L. A.
(
2005
).
Word learning and individual differences in word learning reflected in event-related potentials
.
Journal of Experimental Psychology: Learning, Memory, and Cognition
,
31
,
1281
1292
.
Raven
,
J. C.
,
Corporation
,
P.
, &
Lewis
,
H. K.
(
1962
).
Coloured progressive matrices: Sets A, AB, B
.
London, UK
:
Oxford Psychologist Press
.
Rodriguez-Fornells
,
A.
,
Cunillera
,
T.
,
Mestres-Missé
,
A.
, &
De Diego-Balaguer
,
R.
(
2009
).
Neuropsychological mechanisms involved in language learning in adults
.
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
364
,
3711
3735
.
Rogalsky
,
C.
,
Rong
,
F.
,
Saberi
,
K.
, &
Hickok
,
G.
(
2011
).
Functional anatomy of language and music perception: Temporal and structural factors investigated using functional magnetic resonance imaging
.
Journal of Neuroscience
,
31
,
3843
3852
.
Schellenberg
,
E. G.
(
2004
).
Music lessons enhance IQ
.
Psychological Science
,
15
,
511
514
.
Schneider
,
P.
,
Scherg
,
M.
,
Dosch
,
H. G.
,
Specht
,
H. J.
,
Gutschalk
,
A.
, &
Rupp
,
A.
(
2002
).
Morphology of Heschl's gyrus reflects enhanced activation in the auditory cortex of musicians
.
Nature Neuroscience
,
5
,
688
694
.
Schön
,
D.
,
Magne
,
C.
, &
Besson
,
M.
(
2004
).
The music of speech: Music training facilitates pitch processing in both music and language
.
Psychophysiology
,
41
,
341
349
.
Seppänen
,
M.
,
Hämäläinen
,
J.
,
Pesonen
,
A. K.
, &
Tervaniemi
,
M.
(
2012
).
Music training enhances rapid neural plasticity of N1 and P2 source activation for unattended sounds
.
Frontiers in Human Neuroscience
,
6
,
43
.
Snodgrass
,
J. G.
, &
Vanderwart
,
M.
(
1980
).
A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity
.
Journal of Experimental Psychology. Human Learning and Memory
,
6
,
174
215
.
Steinbeis
,
N.
, &
Koelsch
,
S.
(
2008
).
Comparing the processing of music and language meaning using EEG and fMRI provides evidence for similar and distinct neural representations
.
PLoS One
,
3
,
2226
.
Strait
,
D. L.
,
Parbery-Clark
,
A.
,
O'Connell
,
S.
, &
Kraus
,
N.
(
2013
).
Biological impact of preschool music classes on processing speech in noise
.
Developmental Cognitive Neuroscience
,
6
,
51
60
.
Strait
,
D. L.
,
Slater
,
J.
,
O'Connell
,
S.
, &
Kraus
,
N.
(
2015
).
Music training relates to the development of neural mechanisms of selective auditory attention
.
Developmental Cognitive Neuroscience
,
12
,
94
104
.
Torkildsen
,
J. V. K.
,
Svangstu
,
J. M.
,
Friis Hansen
,
H.
,
Smith
,
L.
,
Simonsen
,
H. G.
,
Moen
,
I.
, et al
(
2008
).
Productive vocabulary size predicts event-related potential correlates of fast mapping in 20-month-olds
.
Journal of Cognitive Neuroscience
,
20
,
1266
1282
.
van den Brink
,
D.
,
Brown
,
C. M.
, &
Hagoort
,
P.
(
2001
).
Electrophysiological evidence for early contextual influences during spoken-word recognition: N200 versus N400 effects
.
Journal of Cognitive Neuroscience
,
13
,
967
985
.
Wagner
,
A. W.
,
Schacter
,
D. L.
,
Rotte
,
M.
,
Koutstaal
,
W.
,
Maril
,
A.
,
Dale
,
A. M.
, et al
(
1998
).
Building memories: Remembering and forgetting of verbal experiences as predicted by brain activity
.
Science
,
281
,
1188
1190
.
Wang
,
X.
,
Ossher
,
L.
, &
Reuter-Lorenz
,
P. A.
(
2015
).
Examining the relationship between skilled music training and attention
.
Consciousness and Cognition
,
36
,
169
179
.
Wechsler
,
D.
(
2003
).
Wechsler Intelligence Scale for children
(4th ed.).
San Antonio, TX
:
The Psychological Corporation
.
Woldorff
,
M.
, &
Hillyard
,
S. A.
(
1991
).
Modulation of early auditory processing during selective listening to rapidly presented tones
.
Electroencephalography and Clinical Neurophysiology
,
79
,
170
191
.
Wong
,
P. C. M.
, &
Perrachione
,
T. K.
(
2007
).
Learning pitch patterns in lexical identification by native English-speaking adults
.
Applied Psycholinguistics
,
28
,
565
585
.
Zuk
,
J.
,
Benjamin
,
C.
,
Kenyon
,
A.
, &
Gaab
,
N.
(
2014
).
Behavioral and neural correlates of executive functioning in musicians and non-musicians
.
PLoS One
,
9
,
99868
.

Author notes

*

Shared last authorship.