The “distractor-frequency effect” refers to the finding that high-frequency (HF) distractor words slow picture naming less than low-frequency distractors in the picture–word interference paradigm. Rival input and output accounts of this effect have been proposed. The former attributes the effect to attentional selection mechanisms operating during distractor recognition, whereas the latter attributes it to monitoring/decision mechanisms operating on distractor and target responses in an articulatory buffer. Using high-density (128-channel) EEG, we tested hypotheses from these rival accounts. In addition to conducting stimulus- and response-locked whole-brain corrected analyses, we investigated the correct-related negativity, an ERP observed on correct trials at fronto-central electrodes proposed to reflect the involvement of domain general monitoring. The whole-brain ERP analysis revealed a significant effect of distractor frequency at inferior right frontal and temporal sites between 100 and 300-msec post-stimulus onset, during which lexical access is thought to occur. Response-locked, region of interest (ROI) analyses of fronto-central electrodes revealed a correct-related negativity starting 121 msec before and peaking 125 msec after vocal onset on the grand averages. Slope analysis of this component revealed a significant difference between HF and low-frequency distractor words, with the former associated with a steeper slope on the time window spanning from 100 msec before to 100 msec after vocal onset. The finding of ERP effects in time windows and components corresponding to both lexical processing and monitoring suggests the distractor frequency effect is most likely associated with more than one physiological mechanism.
It is generally accepted that spoken word production involves selecting a target word from a range of activated lexical candidates. This process has been referred to as the main decision mechanism in language production (Levelt, 1989). According to most theoretical models, candidate words are activated via a process of spreading activation from conceptual to lexical representations (see Goldrick, 2007; Levelt, Roelofs, & Meyer, 1999). However, there is considerable disagreement about whether the selection of the target word is accomplished by competitive or noncompetitive mechanisms. Because of the core importance of the lexical selection process in speech production, understanding the way it is performed (by competition or not) is essential. Here, we attempted to shed light on this issue by studying the brain dynamics associated with picture naming in a paradigm at the center of the debate between competitive versus noncompetitive accounts of lexical selection.
Models implementing noncompetitive selection typically assume a “horse race” mechanism in which the lexical candidate with the highest level of activation is produced after passing a predetermined threshold or after a certain number of time steps (e.g., Mahon, Costa, Peterson, Vargas, & Caramazza, 2007; Caramazza, 1997; Dell, 1986). Competitive lexical selection models instead assume that the time taken to produce a target is a function of the number of activated candidates, with the target selected when a critical difference in activation levels is achieved (e.g., Levelt et al., 1999; Starreveld & La Heij, 1996). One of the major sources of evidence for the latter type of model has come from experimental manipulations using the picture–word interference (PWI) paradigm (Rosinski, 1977). In the conventional PWI paradigm, participants are asked to name a target picture and to ignore an accompanying distractor word. When distractor words are manipulated in terms of categorical relations with the target (e.g., picture FOX, distractor pig), target naming latencies are slower in comparison with unrelated distractors (e.g., FOX–pen)—an effect called “semantic interference.” The predominant explanation of this effect assumes that the activation of the lexical representation of the distractor will spread to its semantically related neighbors, making the selection of the target lexical representation more difficult if it is semantically related to the distractor than if it is not. Indeed, it will take longer to reach a critical difference between the activation levels of the target and the distractor if the two are from the same semantic category. Semantic interference thus reflects the additional time taken to resolve the increased lexical competition.
However, findings using a novel manipulation in the PWI paradigm have been argued to challenge this competition account: Miozzo and Caramazza (2003) introduced a manipulation of the lexical frequency of the distractor word in the PWI paradigm in the absence of a categorical relation with the target. If lexical selection is by competition, they hypothesized that high-frequency (HF) distractor words should compete more with the picture name than low-frequency (LF) words because they have higher activation levels. Instead, they found that LF distractor words slowed target picture naming more than HF words, a finding that has been replicated multiple times (e.g., Dhooge & Hartsuiker, 2010; Dhooge, De Baene, & Hartsuiker, 2013; Geng, Schnur, & Janssen, 2014; Starreveld, La Heij, & Verdonschot, 2013; de Zubicaray, Miozzo, Johnson, Schiller, & McMahon, 2012; Catling, Dent, Johnston, & Balding, 2010). A broad framework for explaining differential naming latencies in the PWI paradigm involves processing capacity constraints: Reading a word is faster than naming a picture (Cattell, 1885, as discussed in Levelt, 2012). Thus, distractor words will be processed faster than target pictures. Therefore, delays in target naming will reflect the time required to process each type of distractor (de Zubicaray et al., 2012; Miozzo & Caramazza, 2003). However, there is disagreement about the locus of the processing delay resulting in the distractor frequency effect. Whereas the first category of explanations argue that the processing delay occurs early on, before, or as the name of the picture is accessed, other explanations argue for a late locus of the processing delay, after the name of the picture has been accessed.
Within one prominent competitive lexical selection account—the WEAVER++ model—the distractor frequency effect has been interpreted in terms of an attentional mechanism, distractor blocking by a condition–action rule, sensitive to the frequency of the distractor word (Roelofs, 2005; see also Roelofs, Piai, & Schriefers, 2011). The account assumes HF words will be read more quickly than LF words and will therefore be blocked more quickly. According to Roelofs et al. (2011), the speed of blocking is dependent on an initial processing response to the distractor information, and this processing may span word recognition to word form encoding. Starreveld et al. (2013) have similarly proposed an input account involving different recognition thresholds for HF and LF words. The differential threshold account also assumes that the activation level of a distractor word decays upon recognition in a manner proportional to its level of activation (i.e., exponential decay). Thus, both of the above explanations predict that the distractor frequency effect occurs early in time, at a lexical level or earlier, while preserving a competitive lexical selection account.
Alternative accounts of the distractor frequency effect in PWI place its locus at a postlexical stage of processing. Building on Miozzo and Caramazza's (2003) proposal of a task-specific distractor blocking mechanism in PWI, the response exclusion account (e.g., Mahon et al., 2007; Finkbeiner & Caramazza, 2006) assumes that distractor words have a privileged relationship with the articulators as reading is considered to be more automatic than picture naming: The distractor word enters an articulatory buffer as a phonologically well-formed response before the phonological representation of the picture name (target). The resulting bottleneck is then solved by a decision mechanism. According to this account, the relative speed of entry and removal of the distractor into and out of the buffer will determine the presence or absence of an interference effect. The selection of the correct response then occurs closer to the response output rather than before accessing the name of the picture, as suggested by the input account. As HF distractors are assumed to be read more quickly and enter the buffer earlier, they are excluded more quickly than LF words. Thus, the account assumes that a task-specific, postlexical, noncompetitive selection mechanism is responsible for the distractor frequency effect in PWI. As the response exclusion account was devised solely to explain PWI effects and its decision mechanism was relatively underspecified, Dhooge and Hartsuiker (2010) sought to incorporate the account within the broader framework of speech production by equating its operations with those of the verbal self-monitoring system (e.g., Hartsuiker & Kolk, 2001; Levelt, 1989).
As both input and output accounts can potentially explain the distractor frequency effect in naming latencies, other means are needed to establish its locus. Using fMRI with a sparse temporal sampling acquisition, de Zubicaray and colleagues (2012) tested whether the distractor frequency effect was associated with differential activity in brain regions predicted by input or output accounts. They derived hypotheses based on Indefrey's (2011) meta-analysis of neuroimaging and electrophysiological studies of spoken word production. This meta-analysis identified reliable roles for the mid-to-posterior sections of the middle and superior temporal gyri (STG) in lexical-level processes of lexical–conceptual selection and phonological word form retrieval, respectively, within a predominantly left-lateralized network. According to the predictions made by this meta-analysis, during picture naming, activity in these two regions typically occurs between 100 and 400 msec after object recognition, with postlexical processes including articulation occurring between 400 and 600 msec in left inferior frontal and premotor cortical areas, respectively (see also Strijkers & Costa, 2011, for time course estimates). In addition, the meta-analysis identified a role for bilateral posterior STG in verbal self-monitoring. The fMRI data revealed significant differential activity in bilateral medial frontal (ACC and SMA) and lateral premotor cortices, in addition to posterior STG. However, no differential activity was observed in the mid-middle temporal gyrus, leading de Zubicaray et al. to conclude that the distractor frequency effect most likely had a postlexical locus per the output account.
While spatially informative, fMRI using the sparse acquisition methodology is unable to provide time course information. Hence, it remains possible that some or all of the regions observed in the de Zubicaray et al. study could have been activated during lexical rather than postlexical time windows. Our main aim in the current study was therefore to use EEG to determine the time course of activity associated with the distractor frequency effect. The application of EEG for researching overt speech has increased in recent years (Ganushchak, Christoffels, & Schiller, 2011), aided by the development of analysis techniques for reducing articulation-related electromyographic artifacts, which otherwise heavily pollute EEG signal (e.g., de Vos et al., 2010). In addition to the time course of the activation of the brain regions reviewed above, several studies have indicated that spoken word production might also engage aspects of domain general monitoring systems that are sensitive to conflict during information processing (e.g., Acheson, Ganushchak, Christoffels, & Hagoort, 2012; Riès, Janssen, Dufau, Alario, & Burle, 2011; de Zubicaray, 2006; see Nozari, Dell, & Schwartz, 2011, for a detailed theoretical account). In particular, a response-locked, fronto-central negative potential peaking shortly after both erroneous and correct vocal responses has been observed reliably in production paradigms. These ERPs, initially observed during performance of nonlinguistic tasks, have been referred to as the error- and correct-related negativities (ERN and CRN), respectively, and are proposed to have a common source in ACC and/or SMA (e.g., Bonini et al., 2014; Roger, Bénar, Vidal, Hasbroucq, & Burle, 2010; Debener et al., 2005; Dehaene, Posner, & Tucker, 1994). As this response-locked negativity is observed for both correct and incorrect trials, it has been interpreted as reflecting a general response monitoring rather than error-detection system (Riès et al., 2011; Vidal, Burle, Bonnet, Grapperon, & Hasbroucq, 2003; Vidal, Hasbroucq, Grapperon, & Bonnet, 2000). The amplitude of the CRN is usually smaller than the ERN in healthy participants. Importantly, the negative potential emerges before vocal onset, that is, before auditory feedback from an overt vocal response can be perceived, indicating that it is likely to reflect monitoring of internal rather than external speech (Riès, Xie, Haaland, Dronkers, & Knight, 2013; Riès et al., 2011). The precise nature of the representations being monitored remains a matter of debate (e.g., Acheson & Hagoort, 2014).
To date, the only EEG study to have examined the distractor frequency effect is that of Dhooge et al. (2013). Those authors reported stimulus-locked analyses from 31 electrodes showing three separate effects with LF distractors showing more negative amplitudes: the first over left and central electrodes at 20–60 msec, the second occurring between 420 and 500 msec over all electrodes, and the third between 520 and 580 msec again over left and central electrodes. Dhooge et al. (2013) interpreted the latter two effects as being consistent with the response exclusion account and operation of the verbal self-monitor (e.g., Mahon et al., 2007), as they occurred later than lexical-level processes that typically occur within the first 400 msec. The initial effect was interpreted as being too early for frequency-related word recognition processes that are typically reported in the 150- to 400-msec time window (for review, see Laszlo & Federmeier, 2014) and so unlikely to reflect a distractor blocking mechanism (e.g., Roelofs et al., 2011).
The present study differed from that of Dhooge et al. (2013) for three main reasons. First, we investigated the ERP correlates of the distractor frequency effect using both stimulus- and response-locked analyses of EEG data to test hypotheses from the input and output accounts. Although stimulus-locked analyses of EEG data are able to provide some information about the time courses of processes involved in spoken word production (e.g., Dhooge et al., 2013; Blackford, Holcomb, Grainger, & Kuperberg, 2012), response-locked analyses have been argued to be better suited to observe later effects linked to the production of the response (Riès, Janssen, Alario, & Burle, 2013). In particular, the ERN and CRN suggested to reflect response monitoring are measured response locked (Riès et al., 2011; Vidal et al., 2000, 2003). As speech monitoring is thought to be one the mechanisms sensitive to distractor frequency (according to the output account), we investigated this component in particular. In addition, we performed stimulus-locked analyses to test for a potential early effect of distractor frequency as postulated by input accounts. Second, we also aimed at providing some level of spatial information by using the Laplacian transformation (as in Riès, Janssen, et al., 2013) and high-density EEG recording. Finally, we addressed the problem caused by articulation-related electromyographic artifacts, prominent in scalp EEG studies of overt speech production. We used a blind-source separation algorithm based on canonical correlation analysis (BSS-CCA), enabling to observe clean EEG signal both time locked to stimulus and to vocal onset (as shown in Riès, Janssen, et al., 2013; Riès, Xie, et al., 2013; Riès et al., 2011).
Twenty undergraduate students at the University of Queensland participated in the experiment (10 women; mean age = 23 years, SD = 3.63 years). All were right-handed and native English speakers, with no history of neurological or psychiatric disorder, substance dependence, or known hearing deficits. All had normal or corrected-to-normal vision and gave informed consent in accordance with the protocol approved by the behavioural and social sciences research ethics committee of the University of Queensland.
The materials were identical to those employed by de Zubicaray et al. (2012) and Catling et al. (2010; Experiment 1). Forty-eight black and white line drawings were chosen from the Snodgrass and Vanderwart (1980) corpus. HF and LF word distractors were also matched on a range of linguistic variables including age of acquisition (AoA; more information on the matching variables can be found in the appendix of Catling et al., 2010). Each target picture was paired with an HF and an LF word that did not share a semantic or phonological relationship with it.
The stimuli were presented on a 21-in. CRT monitor (NEC, Itasca, IL, Accusync 120, resolution = 1024 × 768) placed 60 cm in front of the participant. The visual angle approximated 3°. Black and white target pictures (300 × 300 pixels) and superimposed distractor words were presented centrally on a white background. The visual distractor words were presented in red Arial 50-point font.
Stimuli presentation, response recording, and latency measurement (i.e., voice key) were accomplished online via the Cogent 2000 toolbox extension (www.vislab.ucl.ac.uk/cogent_2000.php) for MATLAB (2010a, MathWorks, Inc., Natick, MA) using a personal computer equipped with a noise-cancelling microphone (Logitech, Inc., Lausanne, Switzerland).
Participants were first familiarized with the set of picture stimuli and their corresponding labels below them. In two subsequent blocks, they viewed the pictures without labels and were instructed to name them. The experimenter corrected erroneous naming responses.
After familiarization, participants completed three experimental blocks consisting of 96 trials each. There were 48 word-pair combinations, and each pair was repeated twice per block. There were 144 trials per frequency condition (HF/LF) for a total of 288 trials. Participants were instructed to name the pictures as quickly and accurately as possible while ignoring the superimposed distractor word. They were also instructed not to correct themselves if they made an error. Stimuli were presented in the following sequence. A fixation point was shown for 250 msec, followed by a blank screen for 500 msec, and then the target–distractor pair was shown for 750 msec. The intertrial interval was 3 sec. Naming latencies were determined online with voice-key code implemented in the Cogent 2000 toolbox, and responses were verified off-line using Audacity software (http://audacity.sourceforge.net) in case nonvocal noise triggered the voice key.
The EEG was recorded from 128 Ag–AgCl preamplified electrodes using a BioSemi Active Two EEG system (Amsterdam, The Netherlands). The sampling rate was 1024 Hz. The vertical EOG was recorded by means of two surface electrodes above and below the left eye, respectively. The horizontal EOG was recorded with two electrodes positioned over the two outer canthi.
Trials including incorrect or omitted naming responses and speech dysfluencies (e.g., hesitations, stuttering) were scored as errors and excluded from analyses because of their low rate (1.7%). Naming latencies faster than 350 msec and slower than 2000 msec were also excluded (1.9%).
EEG Data Preprocessing
Five participants were excluded from the EEG signal processing because their EEG signals had too many artifacts to permit useful analysis (their recordings had 50% or more of trials rejected). We report analyses performed on the remaining 15 participants (nine women; mean age = 23.8 years, SD = 3.6 years).
Channels C8, C32, D28, B1, and B9 were rejected from the data of the participants under analysis because the signal recorded at these channels contained too many artifacts in some of the participants.
After acquisition, the EEG data were filtered (high-pass = 0.16 Hz) and resampled at 256 Hz. Vertical eye movement artifacts were then corrected through independent component analysis (ICA) as implemented in EEGLAB (Delorme & Makeig, 2004). For each participant, we manually determined the ICA component that best reflected eye blinks by looking at both their waveforms and their topographies. The waveforms of these components were compared with that of the raw EEG signal to match for eye blink location in time. We removed only the component that clearly captured the eye blinks and with a clear anterior and symmetrical topography.
Speaking induces large facial EMG activities that contaminate the EEG signal. To reduce the EMG artifacts induced by articulation, we used BSS-CCA (de Clercq, Vergult, Vanrumste, Van Paesschen, & Van Huffel, 2006) that separates sources based on their autocorrelation. The suitability of BSS-CCA for removing articulatory EMG bursts from EEG signal is described in detail in de Vos et al. (2010) and was used successfully to study monitoring-related components in Riès et al. (Riès, Xie, et al., 2013; Riès et al., 2011). In the current study, the BSS-CCA method was applied twice: first on nonoverlapping consecutive windows of 30 sec to target tonic EMG activity produced by continuous contraction of the facial or neck muscles and second on nonoverlapping consecutive windows of 1.5 sec (average RT = 775 msec, σ = 164 msec) enabling the targeting of local EMG bursts (this was done automatically using the EEGLAB plug-in Automatic Artifact Removal implemented by Gomez-Herrero available at http://www.cs.tut.fi/gomezher/projects/eeg/software.htm#aar). EMG-related components were selected according to their power spectral density. As explained in de Vos et al. (2010), components were considered to be EMG activity if their average power in the EMG frequency band (approximated by 15–30 Hz) was at least one fifth of the average power in the EEG frequency band (approximated by 0–15 Hz). The use of BSS-CCA was preferred over that of ICA for the separation of EEG sources from EMG sources based on previous investigations showing that ICA could not separate these sources optimally and was less specific than BSS-CCA for this particular application (e.g., de Vos et al., 2010; de Clercq et al., 2006). The benefits of BSS-CCA are shown on the power spectra of the response-locked grand averages calculated over a large time window (from 1000 msec before the vocal onset to 500 msec after the vocal onset; Figure S1). Before muscle artifact removal, the power spectra show a lot of HF activity across a broad frequency range. After BSS-CCA, this HF activity is clearly reduced. The topography of the difference in spectral content before versus after BSS-CCA shows that the removed signal is mainly located at lateral frontal and temporal recording sites, where the muscular artifacts are most prominent.
After the BSS-CCA procedure, all remaining artifacts were manually rejected by a trial-by-trial visual inspection of the monopolar recordings. Particular attention was paid to small local artifacts to allow for the subsequent use of Laplacian transformation, which is more sensitive to local small artifacts than the more commonly used monopolar recordings. The remaining EEG recordings were averaged, individually, to stimulus presentation and to vocal onset. Laplacian transformation (i.e., current source density estimation), as implemented in Brain Analyser TM (BrainProducts, Munich, Germany), was applied to each participant's averages and on the grand averages as in Riès et al. (Riès, Janssen, et al., 2013; Riès, Xie, et al., 2013; Riès et al., 2011; degree of spline: 3°, Legendre polynomial: 15° maximum). Laplacian transformation has been shown to increase the spatial resolution of the signal providing a good estimation of the corticogram (Nuñez, 1981). Components therefore appear more focal after Laplacian transformation than on the more commonly used monopolar recordings. We assumed a radius of 10 cm for the sphere representing the head. The resulting unit was microvolts per square centimeter (μV/cm2). A 30-Hz low-pass filter and a 1-Hz high-pass filter were applied off-line on the EEG data. For the purpose of cluster-based permutation testing, we also computed Laplacian transformation on the individual trials of each participant.
Repeated-measures analyses of variance were conducted with naming latencies as the dependent variable and distractor frequency as within-participant (F1) and within-item (F2) factors. Errors were not subjected to analyses because of their low rate.
Three different types of analysis were performed on the EEG data to detect the differences between the ERPs for HF versus LF distractor words.
For whole-brain analyses, all 123 scalp electrodes were included in the tests. Time-locked to the stimulus, the test was performed on all time points between 0 and 500 msec (i.e., 15,867 total comparisons), and any electrodes within approximately 5.44 cm of one another were considered spatial neighbors (assuming a 56-cm average head circumference). The baseline was the 200-msec time window ranging from 200 msec before stimulus onset to stimulus onset. Repeated-measures t tests were performed for each comparison using the original data and 2,500 random within-participant permutations of the data. For each permutation, all t scores corresponding to uncorrected p values of .05 or less were formed into clusters. The sum of the t scores in each cluster is the “mass” of that cluster, and the most extreme cluster mass in each of the 2,501 sets of tests (derived from the 2,500 permutations and from the real data) was recorded and used to estimate the distribution of the null hypothesis. We used 2,500 permutations to estimate the distribution of the null hypothesis as it is over twice the number recommend by Manly (1997) for a family-wise alpha level of .05.
Time-locked to the response, we performed the same analysis but on all time points between 500 msec before vocal onset and vocal onset with a baseline corresponding to the 200-msec time window between 1000 and 800 msec before vocal onset. We also performed a whole-brain test on the 200 msec after vocal onset with a baseline taken from 200 to 100 msec before vocal onset (i.e., 6,396 comparisons).
We also performed ROI type of analyses time-locked to stimulus and vocal onset (Figure 1). These tests were performed using the same time windows and the same baseline corrections as the whole-brain analyses. Given the fMRI results reported by de Zubicaray et al. (2012) and our hypotheses concerning lexical-level processes and monitoring, we restricted our analysis to a fronto-central cluster of electrodes around the equivalent to FCz/Fz in a 64-electrode system (C20, C21, C22, C23, C24, C25, C12, C11), a left temporal cluster (D23, D22, D21, D26, D29, D30, D31, D24, D25), and a right temporal cluster (B26, B25, B24, B16, B15, B14, B13, B12, B11). We note that the signal was not averaged over electrodes. The number of comparisons was greatly reduced in these analyses compared with whole-brain analyses (between 1,152 and 416 comparisons).
We focused more closely on the fronto-central component identified as the CRN and performed statistical analyses on the slope of the activity and peak-to-peak amplitudes of Laplacian-transformed data, similarly as in previously reported studies (Riès, Janssen, et al., 2013; Riès, Xie, et al., 2013; Riès et al., 2011). A negative peak could not be identified within the time window centered around the latency of the peak on the grand averages for 4 of the 15 participants whose data were kept for analysis. We thus measured the surface below the curve on a 50-msec time window around the latency of the peak as measured on the grand average in each participant. We also measured the surface below the curve on a 50-msec time window around the latency of the preceding positive dip on the grand average in each participant. We then subtracted this surface measure from the one corresponding to the negative peak, and it is this surface difference that we refer to as the peak-to-peak amplitude. This type of measurement was preferred over taking the real difference between two peaks to reduce the contribution of noise. Slopes were measured by fitting a linear regression to the data to attest for the statistical existence of the component by comparing it with zero. This measure was chosen because it is also independent from the baseline and gives morphological information about the data (Carbonnell, Hasbroucq, Grapperon, & Vidal, 2004).
Analysis of mean naming response times (RTs) revealed a significant effect of Distractor condition by both participants (F1[1, 19] = 15.02, MSE = 182.2, p < .001, partial η2 = .44) and items (F2[1, 47] = 8.9, MSE = 671.8, p < .005, partial η2 = .16). Pictures with LF words (mean = 784 msec) were named, on average, 16 msec slower (95% confidence interval of difference = 9 msec) than pictures with HF distractors (mean = 768 msec), replicating previous results (Catling et al., 2010, Experiment 1; Dhooge et al., 2013; Dhooge & Hartsuiker, 2010; de Zubicaray et al., 2012).
The whole-brain analysis of the 500 msec after stimulus onset revealed a significant effect of Distractor frequency at two right inferior frontal sites (C6 and C7) and one right temporal site (B25; Figure 2A). The Laplacian-transformed ERPs started to diverge at around 100 msec after stimulus onset, and the difference remained until around 300-msec poststimulus (start: 86, 116, and 137 msec; finish: 332, 344, and 359 msec poststimulus at C07, C06, and B25, respectively). Although the topographies of the difference wave show earlier left frontal and temporal foci, no significant difference was found at those sites (Figure 2B). We also note that the difference is not clearly visible at B25 on the topographies, although it can be seen on the waveforms (Figure 2C); this is because of the choice of baseline (−200 to −100 msec for the figures).1
Stimulus-locked ROI type of analyses did not reveal any significant effect of Distractor frequency. We note that these ROIs did not include the electrodes showing significant effects in the whole-brain analysis (i.e., C06, C07, and B25).
None of the response-locked whole-brain analyses revealed any significant effect of Distractor frequency.2 At the fronto-central ROI, there was an effect at electrode C21 (corresponding to Fz in the 10–20 system; Figure 3). This effect started 20 msec after vocal onset and lasted until 121 msec after vocal onset.3 No other ROI type of analysis revealed any significant effect of Distractor frequency. We describe the effect on this fronto-central component in more detail below.
The fronto-central component started 121 msec before and peaked 125 msec after vocal onset on the grand averages, highly resembling the CRN (Figure 3B and C). Irrespective of distractor frequency condition, its slope was significantly different from zero on the 200-msec time window centered on vocal onset (t = −2.05, p < .05; a one-tailed Student t test was used given the direction of the difference was expected based on previous reports by Riès et al.  and Vidal et al. [2000, 2003]). The slope analysis also revealed a significant difference between HF and LF distractor words, where HF distractor words were associated with a steeper slope than LF distractor words on the time window spanning from 100 msec before to 100 msec after vocal onset (t = −2.43, p < .05, two-tailed Student t test). The peak-to-peak amplitude measured between the negative peak and the preceding positive dip revealed that HF distractor words were associated with a larger CRN than the LF distractor words (t = 3.07, p < .01, two-tailed Student t test; see Figure 3C for topography of the difference wave after vocal onset).
Using high-density EEG recordings, we contrasted input and output accounts of the distractor frequency effect in spoken word production. We replicated the distractor frequency effect in picture naming latencies using identical stimuli to previous studies (e.g., de Zubicaray et al., 2012; Catling et al., 2010). Our ERP results show significant effects of distractor frequency both time locked to stimulus presentation and to vocal onset. Stimulus-locked effects started as early as 100 msec and lasted until around 300 msec after stimulus onset and were confined to right inferior frontal and temporal cortex recording sites. Response-locked distractor frequency effects were found closely after vocal onset on a fronto-central component corresponding to the CRN. These results suggest that the distractor frequency effect is most likely associated with more than one physiological mechanism and possibly reconcile rival input versus output accounts of this effect. We discuss the possible nature of these mechanisms below.
According to the competitive lexical selection—or input—account, the distractor frequency effect reflects either a differential recognition threshold for HF and LF words (e.g., Starreveld et al., 2013) or an attentional mechanism that implements reactive blocking during processing of distractors that potentially encompasses word recognition up to word form encoding (WEAVER++; Roelofs et al., 2011). The time window of the effects observed over right inferior frontal and temporal sites in the stimulus-locked analyses indicates the operation of relatively early mechanisms. A recent review of ERP studies of word recognition indicated that word-frequency-related components are reported reliably in the 100- to 400-msec window over central, left, and right hemisphere sites (see Laszlo & Federmeier, 2014). The effects observed here were predominantly right lateralized. This lateralization could be interpreted in terms of the recruitment of right-hemisphere-dominant attentional mechanisms that are known to operate during language tasks (e.g., Petersen & Posner, 2012; Vigneau et al., 2011; Roelofs, 2003). This would be in agreement with the input account postulating an early attentional mechanism involved in blocking the processing of the distractor word (Roelofs et al., 2011; Roelofs, 2005). Similarly, the effect at right inferior frontal sites could be considered consistent with the operation of an inhibitory control mechanism (e.g., Aron, Robbins, & Poldrack, 2014) that would favor processing of the target picture over processing of the distractor word by reactively blocking the latter (e.g., Roelofs et al., 2011). Response inhibition is closely related to resistance to distractor interference (e.g., Friedman & Miyake, 2004). We note that differences between distractor frequency conditions were visible at other sites on the difference wave topographies (Figure 2B). Although these did not reach statistical significance, it is worth mentioning that the type of analysis we used may have missed those. Indeed, as mentioned in the Methods section, cluster mass permutation tests are not very good at detecting short-lived effects. In addition, the large number of electrodes used in the whole-brain analysis yielded a large number of comparisons, which may have disadvantaged smaller effects. Some of the sites where differences could be seen on the difference waves were not included in the ROI type of analyses (e.g., left inferior frontal sites), as these were not over brain regions identified as being sensitive to this effect in earlier studies and we did not have a priori reasons for targeting these regions (de Zubicaray et al., 2012). However, our point here is mainly to note the existence of early distractor frequency effects as these had not been observed previously and are in agreement with one of the hypotheses tested.
Dhooge et al. (2013) predicted “effects of distractor frequency should be evident only after 350 ms” if the response exclusion account is correct (p. 233, emphasis added). However, the stimulus-locked activity observed in the 100- to 300-msec time window is clearly not consistent with this account. Could the finding of a negative-going ERP sensitive to distractor frequency in the response-locked fronto-central ROI analysis be used to support an output account?
This effect was visible on the CRN, which started to rise around 100 msec before vocal onset. This is in the time window attributed to articulatory preparation according to meta-analyses (Indefrey, 2011; Indefrey & Levelt, 2004) and would thus support an output locus of the distractor frequency effect, within this theoretical framework. However, Hutson and Damian (2013) have recently demonstrated a distractor frequency effect with manual classifications in two separate experiments, indicating that the effect is unlikely to involve articulatory-motor programs and might instead involve a relatively earlier, more abstract level of representation. As we noted in the Introduction, the fronto-central CRN has been observed in both psycholinguistic and nonpsycholinguistic tasks and is therefore interpreted as reflecting the operation of a domain general monitor (e.g., Acheson et al., 2012; Riès et al., 2011). In spoken word production, the ERN/CRN is likely to reflect monitoring of internal rather than external speech as it arises before the response is made (Riès, Xie, et al., 2013; Riès et al., 2011). As Hutson and Damian (2013) note, there is considerable evidence that we are able to monitor our inner speech at a relatively abstract prearticulatory level. Thus, a self-monitoring account of the distractor frequency effect might be plausible if a domain general monitor was assumed to operate on relatively abstract representations.
Thus, the input and output accounts may be reconcilable on the basis of our observations. We can speculate about how an early attentional or inhibition mechanism and speech monitoring might be linked. Monitoring is assumed to be always in place during speech production, meaning it is sensitive to the accuracy of each step of speech production (Dhooge & Hartsuiker, 2010). The normal function of the monitoring system(s) is to inspect internal and external speech for problems, and this function is assumed to operate relatively independently of the mechanism for lexical selection (e.g., Hartsuiker & Kolk, 2001; Levelt, 1989). Thus, the fronto-central CRN observed for the distractor frequency effect might reflect the ubiquitous operation of the monitor, checking the outcome of lexical selection, which may itself be facilitated by an early attentional blocking mechanism (e.g., Starreveld et al., 2013; Roelofs et al., 2011). Moreover, it is possible that speech monitoring is generally engaged more strongly in the more demanding condition, although the early attentional blocking mechanism is usually able to block out the distractor most of the time. According to this interpretation, there might be more than one locus or physiological mechanism responsible for the distractor frequency effect. This interpretation is appealing as it has the potential of reconciling both input and output accounts (see also van Maanen, van Rijn, & Borst, 2009, for an account that assumes interference can be distributed over multiple stages of processing).
Before accepting the above interpretation, it is worth noting the consistencies and inconsistencies with prior neuroimaging and electrophysiological studies of the distractor frequency effect. Our findings of significant ERPs at right temporal and fronto-central sites are consistent with the results of a previous fMRI study that reported bilateral activity in these regions and so provide complementary information about the time courses of those effects (e.g., de Zubicaray et al., 2012). Despite the topographies of the difference wave showing early left frontal and temporal foci, results at those sites did not reach statistical significance in this study (see also limitations of the statistics used above). In addition, no right inferior frontal activity was observed in the earlier fMRI study. Although EEG and fMRI provide complementary information, it is not unusual to find effects present in one modality that are absent in the other because of the differing nature of hemodynamic and electrophysiological signals (e.g., Geukes et al., 2013; Vartiainen, Liljeström, Koskinen, Renvall, & Salmelin, 2011; Van Petten & Luka, 2006).
We were unable to replicate the stimulus-locked effects reported by Dhooge et al. (2013) in their recent lower-density EEG study with Dutch-speaking participants, despite testing a subset of comparable electrode sites. The reason for this discrepancy is not immediately apparent, although it could reflect differences in the way stimuli were constructed or distractors were presented across the studies. For example, this study employed the English language stimuli created by Catling et al. (2010) and employed by de Zubicaray et al. (2012) in their fMRI study, replicating those results in terms of naming latencies. Catling et al.'s (2010) HF and LF distractor stimuli were carefully matched in terms of AoA among other lexical variables. Lexical frequency and AoA are usually highly confounded. This confound might influence ERP results, as frequency and AoA effects involve different neurophysiological mechanisms (see Hutson & Damian, 2013; de Zubicaray et al., 2012; Catling et al., 2010). In addition, distractor–target picture stimuli were presented for fixed durations of 750 msec in this study, whereas the durations of Dhooge et al.'s (2013) stimuli presentation were more variable, remaining on screen until the participant responded. Moreover, our study differed from the Dhooge et al. (2013) in several methodological aspects of signal processing and analysis (including the eye-blink removal technique, articulation-related EMG artifact removal, filtering, baseline correction, and statistical tests used). These could also have influenced the results.
Finally, we would like to frame our results in the broader context of the cognitive architecture of language production, while acknowledging the task-specific nature of the mechanisms involved in the distractor frequency effect in PWI. The distractor frequency effect has been used to inform the process of lexical selection, a core decision mechanism in language production. Miozzo and Caramazza's (2003) initial account of this effect was framed against the competitive account of lexical selection. Indeed, they hypothesized that, if lexical selection was a competitive process between highly activated lexical representations, then the HF distractor words should compete more with the picture name than the LF distractor words as HF words are thought to be more highly activated than LF words. Instead, the slower naming latencies in the LF versus HF distractor conditions were interpreted as inconsistent with the notion of competition at the level of lexical selection. Our results suggest that the distractor frequency effect is associated first with an early attentional mechanism that preferentially blocks HF distractor words, as hypothesized by the WEAVER++ input account of this effect (e.g., Roelofs et al., 2011). This attentional blocking mechanism operates within the time window typically attributed to conceptual and lexical access. Second, the distractor frequency effect is also associated with a domain-general monitoring mechanism that verifies the performance of the early attentional selection mechanism. Note that this is a different monitoring mechanism to that proposed by Dhooge and Hartsuiker (2010) and Dhooge et al. (2013). In their account of the distractor frequency effect, the self monitor checks the content of the output buffer and initiates a time-consuming correction to purge the incorrect distractor response. This is an earlier process to that proposed here, which we envisage entails postselection response verification consistent with the proposed operations of the CRN rather than an error-detection system per se. As Dhooge et al. (2013) themselves noted, after response selection has occurred, “the only process left will be the checking of the picture's response” before articulation is initiated (p. 233). Thus, our results point to the importance of monitoring processes at different stages of language production (see Postma, 2000, for a similar perspective). In addition, the observation of dual loci for the distractor frequency effect emphasizes the fact that multiple physiological mechanisms can be responsible for a given behavioral effect.
We tested rival input and output accounts of the distractor frequency effect in picture naming using both stimulus- and response-locked analyses of ERPs recorded with high-density EEG. According to input accounts, the locus of the effect should occur during processing that encompasses word recognition to form encoding and thus be completed within the initial 400 msec after stimulus presentation. By contrast, the output account assumes a later locus during processing of articulatory representations, potentially reflecting the involvement of the self-monitoring system. Our results indicate that the distractor frequency effect most likely reflects the operation of more than one physiological mechanism. We argue that these mechanisms involve early attentional processes in addition to domain general monitoring of relatively abstract, prearticulatory representations. If correct, this account has the potential to reconcile input and output accounts of the distractor frequency effect and point to the importance of domain-general cognitive control processes in language production.
This research was supported by a postdoctoral grant from the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health under award number F32DC013245 to S. K. R., a University of Queensland Research Foundation grant to G. Z. G. Z. is supported by an Australian Research Council Future Fellowship (FT0991634). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We are grateful to Chase Sherwell for his assistance with data acquisition and scoring, to Jonathan Mustri for his assistance with data pretreatment, and to Vitória Piai for helpful discussion.
Reprint requests should be sent to Stephanie K. Riès, Knight Lab, Helen Wills Neuroscience Institute, University of California, 132 Barker Hall, Berkeley, CA 94720-3190, or via e-mail: email@example.com.
None of these effects were observed if articulation-related EMG artifacts were not removed (i.e., before BSS-CCA). ERPs were not significantly different in one condition from the other (alpha = .050000) at any time point/window analyzed (all ps ≥ .412800). This underlines the impact of articulation-related EMG artifacts already at this early stimulus-locked time window. We note that there were also no stimulus-locked effects in the ROI analyses (fronto-central ROI: all ps ≥ .847200; left temporal ROI: all ps ≥ .264000; right temporal ROI: all ps ≥ .220800).
The reason why the response-locked whole-brain type of analysis did not reveal any effects whereas the ROI type did could be linked to the fact that response-locked averages are often more noisy than stimulus-locked averages. This is because the detection of the voice onset, which constitutes the time-locking event response-locked, is more variable than stimulus onset.
This effect was also not present before BSS-CCA (all ps ≥ .418400). We note that there was also no effect response-locked in the whole-brain analysis (all ps ≥ .928000) and in the other ROI analyses response locked (left temporal ROI: all ps ≥ .117600; right temporal ROI: all ps ≥ .229600).