Growing evidence suggests that semantic knowledge is represented in distributed neural networks that include modality-specific structures. Here, we examined the processes underlying the acquisition of words from different semantic categories to determine whether the emergence of visual- and action-based categories could be tracked back to their acquisition. For this, we applied correspondence analysis (CA) to ERPs recorded at various moments during acquisition. CA is a multivariate statistical technique typically used to reveal distance relationships between words of a corpus. Applied to ERPs, it allows isolating factors that best explain variations in the data across time and electrodes. Participants were asked to learn new action and visual words by associating novel pseudowords with the execution of hand movements or the observation of visual images. Words were probed before and after training on two consecutive days. To capture processes that unfold during lexical access, CA was applied on the 100–400 msec post-word onset interval. CA isolated two factors that organized the data as a function of test sessions and word categories. Conventional ERP analyses further revealed a category-specific increase in the negativity of the ERPs to action and visual words at the frontal and occipital electrodes, respectively. The distinct neural processes underlying action and visual words can thus be tracked back to the acquisition of word-referent relationships and may have its origin in association learning. Given current evidence for the flexibility of language-induced sensory-motor activity, we argue that these associative links may serve functions beyond word understanding, that is, the elaboration of situation models.
Brain imaging studies have brought forth compelling arguments for a distributed view of conceptual representation (Allport, 1985) by showing category-specific activities in distributed brain regions in healthy participants (Binder & Desai, 2011; Kiefer & Pulvermüller, 2012; Hwang, Palmer, Basho, Zadra, & Muller, 2009; Hoenig, Sim, Bochev, Herrnberger, & Kiefer, 2008; Goldberg, Perfetti, & Schneider, 2006; Martin, Wiggs, Ungerleider, & Haxby, 1996; Martin, Haxby, Lalonde, Wiggs, & Ungerleider, 1995). Crucially, concept representations and the specific sensory and motor properties on which they rely seem to include corresponding sensory-motor structures of the brain (Beauchamp & Martin, 2007; Martin, 2007; Martin & Chao, 2001; Tranel, Damasio, & Damasio, 1997).
Perhaps the most impressive data in favor of the involvement of such modal structures in the representation of conceptual knowledge come from studies that examined the neural correlates of word retrieval (Binder & Desai, 2011). Word-meaning is established by binding the distributed features underlying concept representations for the purpose of language use (Vigliocco & Vinson, 2007). Accordingly, processing words that refer to gustatory (Barrós-Loscertales et al., 2012), olfactory (González et al., 2006), or auditory (Kiefer, Sim, Herrnberger, Grothe, & Hoenig, 2008) sensations has been shown to trigger activity in the brain regions involved in the perception of such sensations (see also Goldberg et al., 2006). In the same manner, visual areas are active during the processing of color and shape words (Simmons et al., 2007; Pulvermüller & Hauk, 2006) or sentences with visual content (Desai, Binder, Conant, & Seidenberg, 2010). Processing words or sentences that denote motor actions, in turn, elicits activity in the brain areas that are responsible for the planning and programming of the depicted actions (Boulenger, Hauk, & Pulvermüller, 2009; Kemmerer, Castillo, Talavage, Patterson, & Wiley, 2008; Aziz-Zadeh, Wilson, Rizzolatti, & Iacoboni, 2006; Tettamanti et al., 2005; Hauk, Johnsrude, & Pulvermüller, 2004; Shtyrov, Hauk, & Pulvermuller, 2004).
Although most current theoretical positions regarding lexical-semantic representations suggest that large parts of the brain regions involved in word processing are not modality specific (Binder & Desai, 2011; see also Willems & Casasanto, 2011), the implication of modality-specific brain regions during word processing is widely acknowledged. However, how modality-specific structures contribute to language processing is still not understood. To advance this point, this study analyzed the acquisition trajectory associated with the mapping of novel words and referents using ERPs.
The study of word-meaning and concept representations in the brain has benefited from ERP studies as they helped discriminate neural markers underlying the processing of divergent semantic categories (Vigliocco, Vinson, Druks, Barber, & Cappa, 2011; Pulvermüller, 1999, 2005; Pulvermüller, Preissl, Lutzenberger, & Birbaumer, 1996). Many of these studies investigated how verbs and nouns were processed (Barber, Kousta, Otten, & Vigliocco, 2010; Pulvermüller, Lutzenberger, & Preissl, 1999; Pulvermüller, Mohr, & Schleichert, 1999; Koenig & Lehmann, 1996; Pulvermüller et al., 1996; Preissl, Pulvermüller, Lutzenberger, & Birbaumer, 1995). Pulvermüller and colleagues, for instance, compared ERPs to action verbs, action nouns, and visual object nouns. Using a current source density analysis, they showed that the spatial topographies of these words varied as a function of grammatical class, but, more crucially, the differences between the visual object nouns and the action nouns were comparable to those between the visual object nouns and the action verbs (Pulvermüller, Mohr, et al., 1999). This finding suggests that differences in processing verbs and nouns extend beyond grammatical properties. To further advance this issue, Barber et al. (2010) manipulated the grammatical class of words for words referring to motor or sensory events (e.g., “the smell” vs. “to sniff”; “the run” vs. “to run”). The results showed that ERP waves in the N400 temporal window (an index of semantic processing; see Kutas & Federmeier, 2011; Kutas & Hillyard, 1984) were modulated in a similar way by grammatical class and semantic attributes (see also Vigliocco et al., 2011, for discussion). Pulvermüller and collaborators (2006) identified distinct ERP patterns to color and form words at 200 msec post-word onset (Moscoso del Prado Martín, Hauk, & Pulvermüller, 2006; Pulvermüller, 2001). Because ERPs capture various word category-specific processes, analyzing brain potentials underlying the acquisition of words that form different semantic categories will help determine whether such specificity can be tracked back to word acquisition.
According to Pulvermüller's influential model of semantic representation, modal structures are recruited during language processing because of multimodal associations that occur during word acquisition (Pulvermüller, 1999, 2005, 2012). Semantic circuits for action words, for instance, include the motor and premotor cortices because action words are typically acquired and experienced in the context of action execution (Goldfield, 2000). Semantic circuits for visual words, in contrast, are thought to include the occipito-temporal visual regions because these words are typically associated with a visual percept. Following this idea, which builds on Hebb's postulate that the synchronous activity of neurons leads to the formation of novel neuronal assemblies (Hebb, 1949), the distributed representation of word-meaning and concept in the brain thus results from associations built during our interactions with the environment. In agreement with this prediction, several studies have recently shown that association learning rapidly generates novel neural circuits, for example, connecting auditory to motor processes through piano playing (Lahav, Saltzman, & Schlaug, 2007). Similarly, Bellebaum et al. (2013) recently showed that the perception of manipulable novel objects is modulated by sensorimotor experience and involves frontoparietal regions (see also Catmur et al., 2008; McNamara et al., 2008; Catmur, Walsh, & Heyes, 2007; Weisberg, van Turennout, & Martin, 2007). In the domain of language, Revill, Aslin, Tanenhaus, and Bavelier (2008) trained participants to associate novel verbal stimuli with motion changes of objects. Following training, language-induced activation could be discerned in cortical regions that support motion processing.
With the aim of directly testing Pulvermüller's hypothesis, our team recently trained participants to learn new action and visual words by associating novel verbal stimuli with the execution of an action or the observation of visual images, respectively. Through the analysis of motor-related brain activity (indicated by a desynchronization of the EEG in the μ frequency bands (8–12 Hz; Gastaut, 1952), we demonstrated that listening to novel words associated with the execution of actions triggered activity in the motor regions of the brain (Fargier et al., 2012). However, using such time–frequency analyses we were not able to determine whether brain structures involved in visual processing were similarly recruited during the processing of words associated with the observation of images.
The purpose of this study was to reanalyze the ERP data reported in Fargier et al. (2012) using correspondence analysis (CA; Benzecri, 1980) to uncover a potentially distinct acquisition pattern for the “motor” and the “visual” words. CA, which is typically used in computational linguistics to reveal distance relationships between words of a corpus (see Ploux & Victorri, 1998), is a computational method that assesses the extent of matching between two variables. Applied to ERPs, it allows isolating orthogonal factors that best explain variations in the data across time intervals and electrodes (see Ploux, Dabic, Paulignan, Cheylus, & Nazir, 2012, for this innovative approach).
Mitchell and collaborators (2008) used a similar computational method to predict neural activity related to the processing of different nouns, based on the contextual use of the nouns. These authors showed that neural activity for a word such as celery can be predicted by the neural activity of other words that are semantically related to celery, such as eat, taste, and fill (Mitchell et al., 2008). In a similar manner, Chan, Halgren, Marinkovic, and Cash (2011) recently used machine-learning algorithms to demonstrate that, in addition to individual words, semantic categories can be decoded from EEG and MEG recordings.
In the present reanalysis, we applied CA to the ERP data recorded by Fargier et al. (2012). In this study, participants were asked to learn novel action words (action-based semantic category) by associating the execution of object-directed hand/arm movements with novel verbal stimuli (i.e., pseudowords); novel visual words (visual-based semantic category) were learned by associating the observation of animated visual images with novel verbal stimuli. Participants were tested before (Pretest) and after (Test 1) training on the first day of the experiment. On the second day of the experiment, participants were tested again before (Test 2) and after (Test 3) an additional training session. ERPs were recorded throughout all sessions. To capture processes typically associated with lexicosemantic processing (e.g., Friederici, 2002), our analyses focused on the 100–400 msec time window that followed stimulus onset.
Sixteen right-handed (scores = 0.79 ± 0.2; Oldfield, 1971), native French volunteers participated in the study (nine women; mean age = 24.2 ± 4 years). None of the participants had a history of psychiatric or neurological disorders; all participants had normal or corrected-to-normal vision. All participants gave written consent to take part in the study in accordance with the Helsinki Declaration (1968). The volunteers were paid for their participation.
The stimuli consisted of 20 different video clips (presented on a computer screen located 80 cm from the participants) and 20 novel verbal stimuli. Half of the clips consisted of reaching-and-grasping movements toward a horizontally or vertically oriented cylinder. The object-directed movement was performed with the right hand. Each movement started from a rest position (the hand on a table in front of the participant's torso) and ended with the grasping of an object (e.g., a zigzag movement). The other half of the clips consisted of animated artificial images, which differed in shape, color, and animation. Ten pseudowords and their temporally reversed counterparts (backward speech1) served as spoken verbal stimuli (examples of pseudowords: “galou,” “munon,” “lival,” “chile”). The pseudowords were bisyllabic and were obtained by changing one or two letters of frequently written French words without violating the phonotactic rules of the French language, (e.g., “sapin” [ ] → “napon” [ ]). The pseudowords were uttered by a male speaker. The average length of the verbal stimuli was 665 msec (±123). The average length of the video clips was 4600 msec (±1170) for movement clips and 3900 msec (±1290) for image clips. Verbal stimuli were presented binaurally through loudspeakers (see Fargier et al., 2012, for a more detailed explanation of the study and illustrations for the visual stimuli).
The experiment was conducted over two consecutive days; each day included three sessions. Day 1 began with a test session (Pretest) in which the participants were instructed to listen attentively to the verbal stimuli and to watch the video clips one by one. This Pretest was followed by a training session in which the participants learned to associate the verbal stimuli with the video clips. The training session was followed by another test session (Test 1), which was identical to the Pretest. Finally, a behavioral test assessed the effectiveness of learning. Day 2 began with a test session (Test 2) followed by a training session, a further test session (Test 3) and the behavioral test. Note that EEGs were recorded throughout all of the sessions (see Figure 1).
During the test sessions (Pretest, Test 1, Test 2, Test 3), the 20 verbal stimuli (pseudowords and backward speech) and the 20 video clips (image or reaching/grasping movement) were presented in isolation in a pseudorandom order. For each trial, participants were requested to fixate on a crosshair at the center of the computer screen, which lasted for 500 msec, and to pay attention to the auditory or visual stimuli subsequently presented. Each stimulus was presented five times.
Each training trial started with a fixation cross presented for 500 msec at the center of the screen. The cross was followed by the presentation of a video clip. Participants were instructed to watch the video. A white screen of 500 msec in duration marked the end of the clip. The same clip was then displayed again together with the verbal stimulus, which was presented shortly after the onset of the clip. When the clip depicted a movement, the onset of the verbal stimulus coincided with the beginning of the movement and participants were requested to imitate the movement while listening to the verbal stimulus. When the clip showed an artificial image, participants simply observed the image again while listening to the verbal stimulus. Participants were explicitly instructed to associate the verbal stimulus with the movement or the image depicted in the clip. Each verbal stimulus/video clip pairing was displayed 15 times, and the 300 trials, in total, were presented in pseudorandom order. The coupling of a given verbal stimulus with a given video clip was counterbalanced between participants such that the same verbal stimulus was associated with a movement for one participant and with an image for another participant. Note that in the training session of Day 2, each verbal stimulus/video clip pairing was displayed 10 times.
A behavioral test was performed at the end of each day. The 20 verbal stimuli were presented one by one, and participants were requested to indicate if the stimulus was associated with an image or a movement in the previous training session. For verbal stimuli coupled with a movement, participants had to reproduce the movement. For verbal stimuli coupled with an image, they were asked to describe shapes, colors, and animations.
EEG data were recorded using BrainAmp amplifiers (Brain Vision recorder software, Brain Products GmbH, Munich, Germany). EEGs were recorded from 32 scalp sites using the international 10–20 system with a forehead ground. Impedance was 10 KΩ or less at the start of the recording. All scalp sites were referenced to AFz and to the left mastoid. Horizontal and vertical eye movements were monitored using EOGs obtained from bipolar recordings from electrodes placed on the outer canthi of the left eye. ERPs were sampled at 500 Hz and were filtered online using a 0.016–200 Hz frequency band.
EEG Data Preprocessing
EEG activity was analyzed using BrainVision Analyzer 2.0 software (Brain Products GmbH, Munich, Germany). To begin, a notch filter (50 Hz) was applied. The EEG was then re-referenced to an average reference (Bertrand, Perrin, & Pernier, 1985), excluding both the hEOG and the vEOG. A manual inspection of the raw data was performed to exclude segments containing obvious artifacts (e.g., movements). An ocular correction ICA (512 ICA steps; convergence bound of 1.10−7) was then performed and ERPs were low-pass filtered to 30 Hz. Data were grouped according to six conditions (pseudowords associated with an image and pseudowords associated with a movement at Pretest, Test 1, and Test 2) and were epoched from −500 msec to 1500 msec relative to stimulus onset. The trials were baseline-corrected (200 msec prestimulus baseline) and then averaged. Grand averages were calculated across participants. Whereas this first step was common to the analysis performed in Fargier et al. (2012), the next sections describe analyses that are specific to this study.
CA is a computational method that assesses the extent of matching between two variables (Ploux et al., 2012; Benzecri, 1980). When applied to ERP segments of a predefined time window, the CA will reveal electrodes and time intervals in which the amplitude difference between experimental conditions is maximal and follows a specific pattern of relation (e.g., amplitude of condition A > B > C or B > C > A). The CA considers the different experimental conditions as a whole, which helps to capture the organization underlying the entire set of data rather than studying differences taken two at a time as in conventional analyses. As applied here, the CA identifies recurring patterns at all electrodes within consecutive slots of 2 msec across the entire prespecified window. These patterns are then classified from the most to the least significant in terms of magnitude and frequency. By way of this classification, the CA identifies the first factors of a system of orthogonal axes that contribute to the largest variance between the tested conditions. The first factor captures the largest variation; the second factor captures the second largest variation and so on. By eliminating redundancy in the original data, the CA attempts to capture variations in the data with a smaller number of factors.
CA can be applied to temporal windows of any length, from very short to long periods. In this study, we chose a medium time window from 100 to 400 msec post-stimulus onset to capture processes typically associated with lexicosemantic processing (Boulenger, Shtyrov, & Pulvermüller, 2012; Hauk, Shtyrov, & Pulvermüller, 2008; Friederici, 2002). The CA was computed using Matlab (The MathWorks, Inc., 1994–2011). To distinguish between learning effects because of the first training session on Day 1 from those of the second training session on Day 2, two CAs were performed. One CA included the Pretest, Test 1, and Test 2, and the other CA included the Pretest, Test 1, Test 2, and Test 3.
Application of the CA
CA was performed on a matrix M100-400mean that contained either six lines (one for each condition: 2 semantic categories × 3 test sessions) or eight lines (2 semantic categories × 4 test sessions) and 4500 columns (see Figure 2). The columns corresponded to a 2-msec sampling of the amplitude of the ERP between 100 and 400 msec post-word onset over the 30 electrodes (all electrodes except EOGh and EOGv). The principal plane output of the CA leads to a map on which the different conditions are plotted as a function of their coordinates on the first two axes (corresponding to the two variables that capture the largest amount of variance in the data).
Significance of map topology
Statistical analysis of the map topology consisted of determining the coordinates of the data of each participant for the six conditions in the orthogonal plane given by the CA. These coordinates are the result of the matrix product Mt100-400subject i×Coordelec,time, where Mt100-400subject i is the transpose matrix of M100-400subject i, similar to M100-400mean, and Coordelec,time is the matrix of the vector column of M100-400mean in the orthogonal plane. Repeated-measures analyses of variance (ANOVAs) were performed to test the predictability of the categories using the coordinates of theconditions on the first axis (X dimension) and the second axis (Y dimension) for each participant. The main factors included Learning (the different test sessions) and Semantic Category (action based and visual based).
Conventional ERP Analysis
In contrast to the CA, which identifies the organization underlying the entire set of experimental conditions taken as a whole, conventional ERP analysis allows comparisons between conditions. Complementary to the CA, conventional ERP analyses were performed on the averaged mean amplitudes of the individual ERPs in the 100–400 msec temporal interval for electrode clusters identified by the CA (see Results section). Repeated-measures ANOVAs, including the factors Learning (Pretest, Test 1; Test 2; Test 3), Semantic Category (action-based; visual-based), ROI (see Results), and Electrodes, were applied.
Participants had acquired all associations between verbal stimuli and video clips by the end of Day 1. Performance remained at a ceiling on Day 2.
Learning Effects after Training on Day 1: Pretest, Test 1, and Test 2
The CA computed over the 100–400 msec post-stimulus onset time window produced a graphical display of the relationship between the conditions. Recall that here, we examined the relationship between data for action-based and visual-based novel semantic categories at 3 points across learning: before (Pretest) and after training (Test 1 on Day 1; Test 2 on Day 2). These relationships are reflected by the distances that separate the variables on the principal plane given by the CA. The horizontal (Factor 1) and vertical axes (Factor 2) that form the orthogonal plane separate the graph into upper and lower quadrants. Therefore, the close proximity of two conditions in a quadrant reflects a strong correlation between the conditions, whereas distance between the two conditions represents a strong difference.
As evident in Figure 3, the CA did not identify any differences between word classes at the Pretest, as reflected by the proximity of the two conditions in the bottom left quadrant (gray circles). At Test 1, however, the action-based category and the visual-based category segregate from each other compared with the Pretest; the action-based category is located in the top left quadrant (black dotted circle), and the visual based-category is located in the top right quadrant (red dotted circle). This distance between the action-based and the visual-based categories persists at Test 2, with the former located in the top left quadrant (black circle) and the latter based in the middle of the right hemi plane (red circle).
To test the significance of the map, repeated-measures ANOVAs were performed on the coordinates of the conditions on the first axis (X dimension) and the second axis (Y dimension) for each participant. The main factors included the following Dimension (first axis X; second axis Y), Learning (Pretest, Test 1, and Test 2), and Semantic Category (action based and visual based). The ANOVA revealed significant main effects of Dimension, F(1, 15) = 7.90, p < .05, Learning, F(2, 30) = 19.95, p < .01, and Semantic Category, F(1, 15) = 8.54, p < .05. The ANOVA also indicated significant interactions of Learning × Dimension, F(2, 30) = 7.63, p < .05, Semantic Category × Dimension, F(1, 15) = 20.93, p < .01, and Learning × Semantic Category × Dimension, F(2, 30) = 10.41, p < .01. Post hoc analyses (LSD Fischer) revealed significant differences between the locations of the action-based category and the visual-based category on the first dimension at Test 1 (p < .01) and on both dimensions at Test 2 (all ps < .01). Effects of learning were also found for each semantic category. For the visual-based category, both dimensions segregated the Pretest from Test 1 and Test 2 (all ps < .01). For the action-based category, the first dimension discriminated the Pretest from Test 2 (p < .01), whereas the second dimension discriminated the Pretest from Test 1 and Test 2 (all ps < .01). Note that a general examination of Figure 3 shows that the first dimension discriminates the two semantic categories independently of the test sessions (post hoc: Pretest vs. Test 1 p < .01; Pretest vs. Test 2 p < .01), whereas the second dimension predicts learning effects with Pretest on the bottom and conditions after learning on the top (post hoc: p < .01). Therefore, the CA performed on the data from the 100 to 400 msec post-stimulus onset time window discriminated the ERPs as functions of learning and the semantic categories that result from learning.
Conventional ERP Analysis
The conventional ERP analyses first focused on the 100–400 msec post-stimulus interval underlying our CA. However, to determine whether the category-specific learning effects occurred sufficiently early to be attributed to lexical-semantic processing and not to processes that occur after the word has been identified, analyses were also performed on the ERP segments within a more restricted time window (100–200 msec) after stimulus onset.
Figure 4 plots topographic maps of the ERPs (100–400 msec interval), contrasting data from Test 2 (Day 2) with those from the Pretest (Day 1). Data are given separately for action-based words (left) and visual-based words (right). In the early time window, the topographic maps indicate a strong negativity at fronto-central electrodes for the action-based category (Figure 4A, left) but not for the visual-based category. For the visual-based category, in turn, a strong negativity is seen at occipito-parietal electrodes (Figure 4A, right). Two clusters of electrodes were thus extracted from the topographic maps for further analyses: The first cluster included frontal (F) and fronto-central (FC) electrodes (FC1, Fz, FC2, FC4, and F4) and will be referred to as the fronto-central cluster. The second cluster included occipital (O) and parietal (P) electrodes (P7, O1, Oz, O2, and P8) and will be referred to as the occipito-parietal cluster. Figure 4B plots the mean amplitudes of the ERPs across the three test sessions according to word categories for the two clusters (in the figure, data are pooled over electrodes). A repeated-measures ANOVA that contrasted the mean amplitudes of the ERPs as functions of learning and ROI was performed. The main factors included Learning (Pretest vs. Test 1 vs. Test 2), Semantic Category (action-based vs. visual-based), ROI (fronto-central vs. occipito-parietal), and Electrodes. The ANOVA revealed significant main effects of Learning, F(2, 30) = 4.60, p < .05, ROI, F(1, 15) = 11.93, p < .01, and Electrodes, F(4, 60) = 3.72, p < .01. The ANOVA also indicated a significant interaction of Learning × ROI × Semantic Category, F(2, 30) = 4.12, p < .05. Post hoc analyses revealed different effects as a function of ROI. In the fronto-central region, a significant effect of Learning was observed (Pretest vs. Test 2; p < .05) for the action-based category only. In the occipito-parietal region, significant effects of Learning were observed for the visual-based category (Pretest vs. Test 2, p < .05; Test 1 vs. Test 2, p < .05). Finally, an effect of Semantic Category was observed for Test 2 (p < .05).
The same ANOVA within the more restricted time window (100–200 msec after stimulus onset) showed significant main effects of ROI, F(1, 15) = 4.02, p < .05, and Electrodes, F(4, 60) = 3.07, p < .05. The main effect of Learning, F(2, 30) = 3.24, p = .053, just failed to be significant. The ANOVA also indicated a significant interaction of Learning × ROI × Semantic Category, F(2, 30) = 4.5, p < .05. Post hoc analyses showed a significant effect of Learning (Pretest vs. Test 2, p < .05) for the action-based category on the fronto-central region and for the visual-based category on the occipito-parietal region (Pretest vs. Test 2, p < .05). These analyses suggest that the neural processes underlying action word and visual word processing encompass different neural circuits.
Learning Effects after Training on Day 2: Pretest, Test 1, Test 2, and Test 3
With further training on Day 2, that is, when the CA included Test 3, the regularity in the data dissolved (Figure 5). The reason for this is that ERPs for Test 3 did not follow the trajectory observed previously. (Recall that to identify recurring patterns, the CA considers the entire set of data. By adding Test 3, the whole output of the analysis changes; thus, the results that we observed without Test 3 will no longer be present in the new analysis.)
Topographic maps contrasting data from Test 3 (Day 2) with those from the Pretest (Day 1) reveal an attenuation of effects seen for the contrast between Test 2 (Day 2) and the Pretest (Day 1). The ERP traces at the previously identified clusters showed no differences between the Pretest and Test 3 (Figure 6).
How semantic categories are encoded in the brain has traditionally been investigated through patient cases or by studying healthy participants with developed semantic knowledge. However, investigating the processes that underlie the acquisition of words from divergent semantic categories could provide important insights into the organization of the lexicon in the brain. Therefore, the purpose of this study was to depict how a visual-based word semantic category and an action-based word semantic category, acquired through a sensory-motor experience, were processed in the time course of learning. We used a novel approach that combined CA with conventional ERP analyses to show that the emergence of divergent semantic categories can be monitored from acquisition and that their encoding relies on different neural processes.
When applied to the first three test sessions (i.e., before the second training), the CA revealed a clear two-factor structure: An organization separating visual-based and action-based words on the one hand and an organization structured by test sessions on the other hand. In other words, word-referent relations and learning sessions were the two factors that caused the strongest variations in the ERP data across all experimental conditions. Visual-based and action-based novel words were thus segregated after only a few hours of training, although the two types of words could not be differentiated before training. Conventional ERP analyses also revealed specific learning-induced activity for newly acquired action-based and visual-based words in the frontal and occipital-temporal regions, respectively. This differentiation was evident 100–200 msec after the word onset. The learning-induced increase in the distance between the action-based and visual-based categories revealed by the CA could thus be related to category-specific activities in the frontal and occipito-temporal regions.
The depicted regularity dissolved when data from the last test session were included in the CA. This finding occurred because ERPs in this last session no longer followed the systematic trajectory observed in the other sessions. This alteration of learning effects with further training could have two likely explanations. The first is that the diminution of learning effects with further training could stem from a temporary attenuation of stimulus-evoked neural activity because of stimulus repetition. This so-called “repetition suppression,” which combines neuronal adaptation and attention-dependent expectation effects (Larsson & Smith, 2012), results from a reduction/optimization of the size of the neuronal ensemble that reacts to repetitive stimuli (Löfberg, Julkunen, Tiihonen, Pääkkönen, & Karhu, 2013). Because participants had acquired all word referent associations by the end of the first day, the additional training on the second day could have provoked such suppression in the ensuing test session (such repetition suppression was less likely to affect Test 2 because it was the first test of Day 2).
Alternatively, there could be a qualitative shift in the way novel words are represented over time. Perceptual and motor features associated with a word may be recruited early during acquisition but will stop being relevant when the word becomes more familiar. Such an account is in line with the dual-process ideas of word acquisition (e.g., McClelland, McNaughton, & O'Reilly, 1995); according to this theory, word learning includes an initial encoding under the form of episodic memories that is then abstracted away to form long-term representations (see Davis & Gaskell, 2009, for a review). In line with such models, several authors report that the formation of cortical circuits for novel words requires a night of sleep (Dumay & Gaskell, 2007; Gaskell & Dumay, 2003). Dumay and Gaskell (2007), for instance, argue that although novel words can be encoded rapidly, their full integration in the mental lexicon, as indexed by competition with preexisting knowledge (i.e., “similar-sounding” words), occurs only overnight. Note though, without rejecting the idea that qualitative changes in lexical representation may occur with time, a qualitative shift in word representations that neutralizes category-specific effects appears incompatible with the fact that category-specific brain activity is seen for well-established words (Barber et al., 2010; Moscoso del Prado Martín et al., 2006; Pulvermüller, Mohr, et al., 1999; see Kiefer & Pulvermüller, 2012; Vigliocco et al., 2011, for reviews).
Putting the data from the last test session aside, the present results are compatible with numerous reports of category-specific effects in studies on already established word categories (Ploux et al., 2012; Pulvermüller, Lutzenberger, et al., 1999; Pulvermüller, Mohr, et al., 1999; Pulvermüller et al., 1996; see Pulvermüller, 2001, 2012; Vigliocco et al., 2011, for reviews). In particular, previous ERP studies have indicated that words with strong visual associations (mostly nouns) and words with strong motor associations (mostly verbs) trigger electrocortical activity over the occipital and anterior frontal regions, respectively (Hauk & Pulvermüller, 2004; Pulvermüller, Lutzenberger, et al., 1999; Pulvermüller, Mohr, et al., 1999). It is believed that these differences are a consequence of neural activity in or close to the motor and visual cortices that underlie the processing of corresponding sensory-motor information (Pulvermüller, 2005). Consistent with this assumption, fMRI experiments have shown that words that refer to actions executed by different body parts (e.g., with the face, arms or legs) activate the motor and premotor cortices in a somatotopic fashion (Boulenger et al., 2009; Kemmerer et al., 2008; Tettamanti et al., 2005; Hauk et al., 2004; Shtyrov et al., 2004). Similarly, words with strong visual attributes tend to activate the visual areas (Desai et al., 2010; Hwang et al., 2009; Goldberg et al., 2006). The present results add to this observation by demonstrating that the development of such word representations takes different (neural) routes right from the start. Together with other learning experiments that show that brain responses to novel verbal stimuli change rapidly with training (Shtyrov, Nikulin, & Pulvermüller, 2010, see also Dobel et al., 2010; Mestres-Missé, Rodriguez-Fornells, & Münte, 2007, 2010), our results support the notion of a “fast track” process of word learning that relies on neocortical circuits (Shtyrov, 2011). Moreover, as predicted by Pulvermüller (1999), our data reinforce the idea that language-induced sensory and motor activity may have its origin in association learning during acquisition.
Nonetheless, do such observations reflect the involvement of modality-specific regions in the construction of linguistic meaning? The significance of language-induced sensory-motor activity for word processing has repeatedly been put aside as post-comprehension effects (Mahon & Caramazza, 2008; Tomasino, Fink, Sparing, Dafotakis, & Weiss, 2008; Toni, De Lange, Noordzij, & Hagoort, 2008; Tomasino, Werner, Weiss, & Fink, 2007; Mahon & Caramazza, 2005). However, the temporal pattern of this activity speaks for the recruitment of sensory-motor structures during lexical access. Lexicosemantic processes are rapid and occur within the first 250 msec post-stimulus presentation (Hauk et al., 2008; Hauk, Davis, Ford, Pulvermüller, & Marslen-Wilson, 2006; Pulvermüller, Shtyrov, & Ilmoniemi, 2005; Friederici, 2002). Differential activation of the sensory-motor areas has been observed as early as 150 msec post-stimulus onset (Moseley, Pulvermüller, & Shtyrov, 2013; Boulenger et al., 2012; Kiefer et al., 2008; Pulvermüller et al., 2005; Hauk & Pulvermüller, 2004; Shtyrov et al., 2004; Pulvermüller, Harle, & Hummel, 2001). Recently, MacGregor, Pulvermüller, van Casteren, and Shtyrov (2012) even noticed lexically modified activation at 50 msec postonset. Here, we report differential electrocortical activity to novel visual-based and action-based words at 100–200 msec post-stimulus onset. It seems difficult to link such early differentiation to post-comprehension processes.
Embodied theories of semantic representation argue that perceptual and motor systems support conceptual knowledge (Kiefer & Pulvermüller, 2012; Binder & Desai, 2011; Gallese & Sinigaglia, 2011; Barsalou, 2008; Gallese & Lakoff, 2005). Within this theoretical frame, the involvement of modality-specific regions in word-meaning retrieval reflects the sensory and motor experiences associated with the referent of the words. The observed differentiation between words that were associated with the execution of object-directed hand movements from those that were associated with the observation of animated visual images could reflect the retrieval of perceptual and motor information experienced during training. Note, however, that sensory and motor structures are not constantly engaged in word processing (Willems & Casasanto, 2011). In fact, recent psycholinguistic studies that investigated word processing within sentential contexts have shown that, for the same action, word language-induced motor activity can switch on and off depending on whether the action is the focus of the sentence (e.g., it is present in a sentence such as “Fiona lifts her luggage” but absent in sentential negation, such as “Fiona does not lift her luggage” or in a volition context “Fiona plans to lift her luggage”; Aravena et al., 2012, 2014; see also Taylor & Zwaan, 2008). Linguistic and most likely extralinguistic contexts thus determine how links between language and sensory-motor structures that develop during acquisition will serve language processing. Determining the conditions under which language-induced sensory and motor activity occurs will ultimately help define its functional role.
In this study, we investigated the neural processes underlying visual-based and action-based word learning by combining CA with conventional ERP analyses. Applying CA or related methods to brain imaging data is a relatively new approach that has attracted an increasing number of researchers over the last few years (Ploux et al., 2012; Chan et al., 2011; Mitchell et al., 2008). Here, this method allowed demonstrating that the two main factors underlying the variability in our ERP data were word semantics (i.e., a differentiation of ERP traces between semantic categories) and learning (i.e., a differentiation of ERP traces between test sessions). Conventional ERP analyses added to this finding by visualizing electrodes that show maximal learning effects for one or the other category and by showing how ERP traces changed over time within the predefined temporal window. By combining CA with information from ERP amplitudes, electrodes, and temporal intervals, we revealed that, right from the start, different neural processes are associated with the processing of newly acquired semantic categories. Together with other studies that reported category-specific effects for well-established words (Barber et al., 2010; Goldberg et al., 2006; González et al., 2006; Moscoso del Prado Martín et al., 2006; Hauk & Pulvermüller, 2004; Pulvermüller, Mohr, et al., 1999; Koenig & Lehmann, 1996; Preissl et al., 1995), the differentiation between semantic categories reported here could reflect the retrieval of perceptual and motor information experienced during the acquisition of words. The context-dependent flexibility in the way language recruits sensory-motor structures (Aravena et al., 2012, 2014) appears to suggest that associative links to sensory and motor structures may serve functions beyond word understanding, for example, the elaboration of situation models (Graesser, Millis, & Zwaan, 1997; Johnson-Laird, 1983) that help listeners to optimally interact with the environment.
We would like to thank Clément Graindorge and Sonia Dupin for the implementation of the CA on Matlab.
Reprint requests should be sent to Raphaël Fargier, L2C2-Language, Brain and Cognition Laboratory, Institute of Cognitive Science, 67 Bd Pinel, 69675 Bron, France, or via e-mail: firstname.lastname@example.org.
Backward speech was used to determine the effects of stimulus familiarity. As in Fargier et al. (2012), data for these stimuli are not further considered.