New theories of monitoring in language production, regardless of their mechanistic differences, all posit monitoring mechanisms that share general computational principles with action monitoring. This perspective, if accurate, would predict that many electrophysiological signatures of performance monitoring should be recoverable from language production tasks. In this study, we examined both error-related and feedback-related EEG indices of performance monitoring in the context of a typing-to-dictation task. To disentangle the contribution of the external from internal monitoring processes, we created a condition where participants immediately saw the word they typed (the immediate-feedback condition) versus one in which displaying the word was delayed until the end of the trial (the delayed-feedback condition). The removal of immediate visual feedback prompted a stronger reliance on internal monitoring processes, which resulted in lower correction rates and a clear error-related negativity. Compatible with domain-general monitoring views, an error positivity was only recovered under conditions where errors were detected or had a high likelihood of being detected. Examination of the feedback-related indices (feedback-related negativity and frontocentral positivity) revealed a two-stage process of integration of internal and external information. The recovery of a full range of well-established EEG indices of action monitoring in a language production task strongly endorses domain-general views of monitoring. Such indices, in turn, are helpful in understanding how information from different monitoring channels are combined.
When speakers make speech errors, they are usually able to notice and correct them. The most obvious way to detect an error is to hear the word that was spoken and realize that it does not match the word that was intended. This process involves monitoring of the auditory output of speech, also called monitoring through the “external channel” (e.g., Hartsuiker & Kolk, 2001; Levelt, 1983). The external channel alone, however, is not sufficient to explain fast and efficient repairs often observed in language production, which has led to the proposal of an additional “internal channel” (Levelt, 1983). Over the past decades, various mechanisms have been proposed for how the internal channel functions, including comprehension-based monitors (e.g., Levelt, 1983, 1989), production-based (e.g., Nozari, Dell, & Schwartz, 2011; Van Wijk & Kempen, 1987; Laver, 1980), and forward models (e.g., Pickering & Garrod, 2013; Hickok, 2012; Tourville, Reilly, & Guenther, 2008). Despite their differences, all of these accounts propose mechanisms capable of detecting an error before it becomes an external stimulus for processing by the perceptual systems (see Nozari & Novick, 2017; Postma, 2000, for reviews).
In parallel, there has been increasing evidence that monitoring in language production involves many of the same brain regions involved in monitoring action in other domains and perhaps follows the same domain-general computations (Nozari & Novick, 2017). It is, therefore, reasonable to expect generally similar electrophysiological indices for language monitoring and monitoring in other domains. This paper combines the two approaches by investigating the EEG indices of action monitoring in language production with and without the contribution of the external channel. Typing is chosen instead of spoken production because (a) precise time-locking of EEG waveforms to the production of single segments (i.e., keystrokes) is possible in typed, but not in spoken, production, and (b) removal of the visual output at the time of production is also possible in typed but not in spoken production. The results provide a first comprehensive view of the EEG correlates of monitoring with and without feedback in typing and may have implications for monitoring in language production in general.
Monitoring through the Internal versus the External Channel
In spoken production, when auditory processing of the production output is blocked by noise, speakers are still capable of detecting a large number of their errors (e.g., Hartsuiker, Kolk, & Martensen, 2005; Postma & Kolk, 1992; Lackner & Tuller, 1979). Some studies have reported lower correction rates under noise-masked conditions (Oomen & Postma, 2001; Postma & Kolk, 1992), but this finding has not always been replicated (Nooteboom & Quené, 2017). The most striking demonstration of the involvement of the external channel in speech monitoring was provided by a study in which words produced in a Stroop task were occasionally replaced by other words (e.g., “gray” for “green”) through the manipulation of auditory feedback (Lind, Hall, Breidegard, Balkenius, & Johansson, 2014). The authors reported that, under ideal playback conditions (short lag between production and playback), participants accepted the manipulated word on 68% of trials, thereby admitting to an error they had not made solely based on the information provided by the external channel. These results suggest an interesting division of labor between the internal and external monitoring channels, but noise masking in spoken production has two problems. First, it does not fully block the processing of speech output through the perceptual system because of bone conduction, unless the noise is very loud. Loud noise, on the other hand, may change monitoring performance for reasons other than simply blocking the auditory input, such as distracting or distressing the speaker. Second, the impaired hearing induced by noise masking activates the Lombard reflex (Lane & Tranel, 1971), causing alterations to speech such as the use of louder voice with higher fundamental frequency and slowed articulation.
Typing, on the other hand, is not subject to the same problem: Visual feedback can be easily delayed without adverse effects. Logan and Crump (2010) manipulated visual feedback in the same way as explained above for the Lind et al.'s (2014) study, in a copy-typing task. Similar to Lind et al.'s results, the authors found that typists tended to accept the manipulated output as their own production, pointing to the influence of the external monitoring channel. This acceptance, however, was not complete, and a good proportion of manipulated trials were detected as having been tampered with, when participants were given the option to label them as such. This finding shows that the internal channel is capable of overriding the external channel, and that the artificial manipulation of feedback may induce unusual behavior in monitoring. Simple removal of visual feedback has also been conducted in typing, but the experiments either did not allow for natural corrections (e.g., Snyder, Logan, & Yamaguchi, 2015) or did not have a comparison with typing with visual feedback (Kalfaoğlu & Stafford, 2014). Moreover, all these studies have used copy typing, which does not require retrieving the orthographic word form from memory (Bonin, Méot, Lagarrigue, & Roux, 2015). In the current study, we used the same general logic as noise-masked studies in speech production and compared typing with and without visual feedback, in a spelling to dictation task in which participants were allowed to type (and correct) as they would in everyday life.
It is important to note that the label “external channel” is being used here to refer to the mechanisms that process the final outcome of production; that is, the auditory word in spoken production and the visual word in typed production. We do not mean to imply that these are the only perceptual consequences of production available for monitoring. In fact, the mere act of motor production creates movements with perceptual consequences that are processed by the proprioceptive system. Most monitoring accounts, however, maintain that monitoring through proprioception involves an internal mechanism (e.g., a forward model of estimating the perceptual consequences; Guenther, 2016), making it distinct from processing of the final output. Moreover, blocking noise in the studies reviewed above does not remove the proprioception of speaking the word, yet it has consequences for monitoring, showing the independent contribution of what has been called the external channel. This work is focused on exploring the contribution of this channel to monitoring in typing.
Neural Correlates of Monitoring
Although the classic theory of speech monitoring, the perceptual loop (Levelt, 1983, 1989), posited a special mechanism for the internal channel, that is, monitoring of inner speech through the comprehension system, the more recent theories of monitoring have shifted more toward a domain-general perspective (Pickering & Garrod, 2013; Hickok, 2012; Nozari et al., 2011; Tourville et al., 2008). Although these theories differ in the precise mechanism(s) through which monitoring is accomplished, which may, in part, be due to the differences in the scope of the models (e.g., motor production vs. lexical-semantic processing), they all apply the same computations involved in monitoring in the action domain to language production. Part of the motivation for the emergence of these domain-general views of monitoring has been the discovery of the shared neural correlates between language and action monitoring. Some of these areas are the ACC (Debener et al., 2005; Gehring & Willoughby, 2004; Herrmann, Römmler, Ehlis, Heidrich, & Fallgatter, 2004), the posterior cingulate cortex (Charles, Van Opstal, Marti, & Dehaene, 2013), the SMA (Bonini et al., 2014), and perhaps the pFC, insula, cerebellum, thalamus and basal ganglia (Fiez, 2016; Bourguignon, 2014; Fedorenko, Duncan, & Kanwisher, 2013; Riès, Xie, Haaland, Dronkers, & Knight, 2013; Bohland, Bullock, & Guenther, 2010; Marvel & Desmond, 2010; Christoffels, Formisano, & Schiller, 2007).
The involvement of domain-general regions in language monitoring generates the expectation that the ERP signatures of monitoring should be, at least to some extent, shared between language and action monitoring. ERP indices of monitoring fall under the two general groups of “error-related” and “feedback-related” components. Error-related indices include two major components: the error-related negativity (ERN or Ne; Gehring, Goss, Coles, Meyer, & Donchin, 1993) and the error positivity (Pe; Falkenstein, Hohnsbein, Hoormann, & Blanke, 1991). The feedback-related indices include the feedback-related negativity (FRN; Miltner, Braun, & Coles, 1997) and the more recently described frontocentral positivity (FCP; Butterfield & Mangels, 2003). In keeping with the domain generality prediction, at least one of these components, the ERN, has been recovered in picture naming tasks (e.g., Riès, Janssen, Dufau, Alario, & Burle, 2011; Ganushchak & Schiller, 2008b; see Nozari & Pinet, 2020). The rest of the components remain understudied in language production. Below, we briefly review these components and their interpretations in the nonlinguistic literature. We then look for the same components in typed production.
The ERN is a negative component with a frontocentral distribution, arising in a 100-msec window around an error. Although the ERN's magnitude can be sensitive to awareness, it is also found on trials without conscious awareness of the errors (Wessel, 2012), suggesting that it is an early and automatic process. Its magnitude is comparable to correct trials when participants are unaware of their errors and classify them as correct with high confidence (Shalgi & Deouell, 2012), suggesting that the ERN arises from a mechanism that estimates the “likelihood” of an error; if such likelihood is estimated (correctly or incorrectly) to be close to zero, no prominent ERN is observed. Especially relevant for our purposes, the ERN has been reported in tasks using linguistic material, such as semantic categorization (Balass, Halderman, Benau, & Perfetti, 2016), lexical decision (Ito & Kitagawa, 2006; Sebastian-Gallés, Rodríguez-Fornells, De Diego-Balaguer, & Díaz, 2006), phoneme monitoring (Ganushchak & Schiller, 2008a), as well as picture naming (Riès et al., 2011, 2013; Ganushchak & Schiller, 2008b) and tongue twister tasks (Acheson & Hagoort, 2014; Möller, Jansma, Rodríguez-Fornells, & Münte, 2007). The ERN is also found in high-conflict conditions in the absence of overt errors (e.g., Acheson, Ganushchak, Christoffels, & Hagoort, 2012).
All these studies, however, reported an ERN locked to the vocal onset of the word but not to each segment (i.e., phoneme or letter), even though the errors on phonemes or letters may reflect selection problems at the level of segments. To our knowledge, only one study has reported an ERN time-locked to each segment in language production: Kalfaoğlu, Stafford, and Milne (2018) reported higher amplitude of the ERN on corrected compared with uncorrected letters in a sentence copying task. However, they reported an ERN on a component isolated by independent component analysis (ICA), which was itself selected based on its activity on corrected trials. The choice of the ICA component based on a condition that was later included in the comparison creates a bias that may affect the findings.
Following the ERN, a centroparietal component called the error-related positivity (Pe) arises from 100 to 300 msec posterror (Gehring et al., 1993; Falkenstein et al., 1991). Unlike the ERN, the Pe does not seem to be generated unless participants are consciously aware of having made an error (O'Connell et al., 2007; Nieuwenhuis, Ridderinkhof, Blom, Band, & Kok, 2001). More recently, Boldt and Yeung (2015) reported a graded modulation of the Pe by confidence ratings in a perceptual decision-making task, which ties the Pe closely to metacognitive judgments, further indicating that it is an index of conscious error detection. Despite their frequent co-occurrence—at least on trials where participants are aware of having made an error—evidence suggests that the ERN and Pe are separable components, which arise from different regions (ERN from SMA and caudal ACC and Pe from rostral ACC; Herrmann et al., 2004), and may represent different error monitoring processes or systems (Hughes & Yeung, 2011; Herrmann et al., 2004; Nieuwenhuis et al., 2001).
Although there are numerous reports of the ERN in linguistic tasks, explicit investigations of the Pe are scarce. Masaki, Tanaka, Takasawa, and Yamazaki (2001) reported a higher positive amplitude following errors than correct trials with a centroparietal topography but in a later time window than the usual Pe, whereas Riès et al. (2011) failed to find such an effect. Kalfaoğlu et al. (2018) reported a positive component of higher amplitude for corrected keystrokes than uncorrected and correct keystrokes in typing. This component followed the ERN on the same ICA-isolated component and could be an early Pe. However, the analysis used by the authors imposed the strict constraint that the two components would have a similar brain localization, whereas previous evidence rather suggested that the ERN and Pe might arise from different regions. Looking at the full brain activity is thus necessary for evaluating the presence of a centroparietal Pe. Whether the Pe can index conscious detection of errors in a generative task such as language production in the same way as forced-choice tasks remains to be established.
The FRN is a feedback-locked index of performance monitoring, observed over frontocentral electrodes, with higher amplitude following negative than positive feedback (Miltner et al., 1997). The amplitude of the FRN is higher the more informative the feedback is (Arbel, Goforth, & Donchin, 2013). In tasks that require explicit associations (e.g., paired-associates learning task), the amplitude of the FRN is positively correlated with future learning, such that items that generated larger FRNs were better recognized later (Luft, 2014; Arbel et al., 2013). The ERN and FRN have many similarities (e.g., Van Schie, Mars, Coles, & Bekkering, 2004; see Holroyd & Coles, 2002, for a review), and they are thought to index internal and external monitoring, respectively. In line with this view, in the course of learning a new task, the amplitude of the ERN increases while the amplitude of the FRN decreases—indicating a shift from reliance on external feedback toward monitoring based on internal information processing (Eppinger, Kray, Mock, & Mecklinger, 2008; Holroyd & Coles, 2002). As far as we know, no studies have looked at the FRN in the context of language production.
An underreported component, called the FCP, follows the FRN after feedback presentation and is associated with long-term learning (Arbel & Wu, 2016; Arbel et al., 2013; Butterfield & Mangels, 2003). In a paired-associate learning paradigm, Arbel and Wu (2016) showed that larger FCP amplitude to negative feedback was predictive of better learning at the end of the task. In a two-session semantic knowledge task, Butterfield and Mangels (2003) found that the amplitude of the FCP on the first session was correlated with the rate of correction on the second session, pointing again to the relation between the FCP and learning. Similar to the FRN, we are unaware of any studies that have reported the FCP in language production.
The Current Study
This study investigates the monitoring of segmental errors, that is, any kind of error that involves a segment (a letter) in typing. If there had been multiple errors in a word, only the first error was analyzed to avoid contamination from earlier posterror processes on later errors. The Appendix includes a definition of error types with examples, as well as additional information about specific error types in this study. As explained earlier, using typing instead of speaking has the advantage of easily blocking external (i.e., visual) feedback. In addition, it is possible to easily time lock ERPs to individual keystrokes. A third advantage is that we could elicit a fair number of errors in a natural task (typing-to-dictation), which allows us to better extend the findings to processes involved in everyday typing. The results, however, may have implications beyond the typing modality. A large body of research has shown the similarities between production patterns across different output modalities such as spoken, hand-written, and typed production (e.g., Muscalu & Smiley, 2019; Breining, Nozari, & Rapp, 2016; Nozari, Freund, Breining, Rapp, & Gordon, 2016; Pinet, Ziegler, & Alario, 2016; Damian & Freeman, 2008). A recent study in our lab has specifically demonstrated largely similar patterns of lexical and segmental errors in spoken and typed production, despite differences such as sensitivity to phonotactic constraints, which is, unsurprisingly, much more prominent in spoken production (Pinet & Nozari, 2018). These similarities suggest similar underlying mechanisms for language production at the stages of lexical selection and segmental encoding across different modalities. Together with the domain-general perspective on monitoring embraced by various accounts of monitoring in language production, the findings of this study could thus shed light not only on monitoring during typing but more broadly on monitoring mechanisms in language production in general.
In a simple typing-to-dictation task, participants typed the words they heard in two conditions. In the immediate-feedback condition, they saw what they typed on the screen in real time. In the delayed-feedback condition, the screen remained blank while they typed and what they had typed appeared with a delay at the end of the trial. The delayed-feedback condition also included a metacognitive judgment, that is, awareness of errors: After they typed the word but before seeing it on the screen, participants were asked to indicate whether they believed they had made an error or not. Comparison of correction rates in the immediate- and delayed-feedback conditions allows us to evaluate the contribution of the external channel to segmental corrections. Because missed corrections may reflect errors that were not detected, we also compare the rate of corrections to the rate of error awareness in the metacognitive judgment task, in the delayed-feedback condition. A low correction rate despite a higher rate of error awareness would suggest that errors had been detected but not corrected.
EEG analysis investigated the standard indices of action monitoring time-locked to each keystroke. Because the ERN indexes internal monitoring, we expect to find it when the main channel available is the internal channel (i.e., the delayed-feedback condition). Its presence in the immediate-feedback condition might depend on the contribution of internal monitoring when external monitoring is also available. If the Pe indexes conscious detection of an error in language production, similar to nonlinguistic tasks, we would expect to observe it when an error is detected. The FRN and FCP can be investigated in the delayed-feedback condition, time-locked to the delayed presentation of feedback. Crucially, this condition allows us to understand how the system combines internal and external feedback.
Seventeen right-handed native English speakers (10 women, age = 25.4 ± 4.5 years) took part in this study. They were consented under a protocol approved by the institutional review board of Johns Hopkins Medicine and were compensated for their time.
Stimuli and Apparatus
We selected 352 seven- and eight-letter nouns from the English Lexicon Project database (Balota et al., 2007). Plural and compound words as well as words that had homophones, phonological or orthographic neighbors, were not included in the material. Words were divided into two lists of 176 words, balanced on psycholinguistic (word frequency, length, number of syllables and phonemes, mean bigram frequency) as well as motoric variables (laterality of the first letter on the keyboard, percentage of bigrams typed with two hands). Words were recorded by a female native speaker of American English. Participants' responses were recorded using a DirectIN PCB v2016 Empirisoft keyboard, which provides millisecond accuracy for keystrokes.
Participants performed a typing-to-dictation task in two conditions (Figure 1). In the immediate-feedback condition, participants saw what they were typing on the screen in real time. In the delayed-feedback condition, a fixation cross was presented on the screen during typing, and what they had typed appeared only at the end of the trial. Conditions were blocked, and participants saw them in a counterbalanced order. One of the two balanced lists of stimuli was used for each feedback condition. The list used for each condition was also counterbalanced between participants.
A trial unfolded in a similar way in both conditions. Participants heard a word, followed by a beep (500 Hz, 100 msec). They were instructed to type the word as fast and as accurately as possible when they heard the beep and to finish typing before a second beep (300 Hz, 100 msec) was presented, 1800 msec after the first one. They had 200 msec after the second beep to finish typing their answer, making it a total typing duration of 2000 msec from the first beep before the trial was terminated. In the immediate-feedback condition, what participants had typed stayed on the screen 300 msec after the end of the trial, during which they could not edit their answer anymore. This was done to avoid visual potentials elicited by the change of display from contaminating the signal of interest. Then, there was a 1.5-sec interval (1000 msec fixation cross, 500 msec blank screen) before the next word was presented. In the delayed-feedback condition, the screen remained blank during typing. At 500 msec after the end of the trial, participants were asked to judge the accuracy of their response by pressing a key with their right hand (“L” for a correct response, “J” for an incorrect response). What they had typed during the trial appeared 200 msec later and stayed on the screen for 1000 msec. The intertrial interval was the same as the immediate-feedback condition. There were three breaks within each block.
EEG Recordings and Preprocessing
EEG data were recorded using a 128-channel HydroCel Geodesic Sensor Net (EGI System) with a vertex reference. Data analysis was performed using Brainstorm (Tadel, Baillet, Mosher, Pantazis, & Leahy, 2011). Participants' performance was closely monitored during the task by the experimenter. As EEG data quality depends on avoiding unnecessary head and eye movements, the experimenter ensured that participants maintained fixation on the screen while typing and did not look back and forth at their hands and the screen. This procedure also ensured that participants did not simply replace one mode of visual feedback (the word on the screen) with another (looking at the keys) in the delayed-feedback condition. Data were filtered offline (0.1–100 Hz) and re-referenced to the averaged mastoids (or one mastoid reference if one of the two mastoid electrodes was too noisy, as was the case for three participants). Electrodes that were noisy for the whole experimental session were rejected and interpolated. Blinks were corrected using ICA (Infomax algorithm; Delorme, Palmer, Onton, Oostenveld, & Makeig, 2012). Residual artifacts were rejected via visual inspection.
Data were segmented in two time windows of interest time-locked to (1) each keystroke (−500 to +500 msec) and (2) feedback presentation (−300 to +1000 msec, only in the delayed-feedback condition). For the latter window, a baseline was taken from −200 to 0 msec and removed by subtraction. Around keystrokes, we followed another procedure. Data inspection revealed that from the first to the last keystroke of the word, the EEG signal presented a linear positive trend. This means that amplitude was higher (in absolute values) at the end of the word compared with the beginning. To collapse data over keystrokes positions, we had to normalize for the position of the keystroke within the word. Because we were interested in the error keystrokes, we subtracted the averaged signal for correct keystrokes from the averaged signal for error keystrokes for each position. This procedure replaced baseline removal for the window time-locked to keystrokes.
Typed responses were classified as errors if any keystroke was different from the target. RT was computed as the time between the start beep (i.e., end of auditory stimulus) and the first keystroke. Interkeystroke intervals (IKIs; i.e., the time between two keystrokes) were averaged over each word and taken to index typing speed. Data outside 2.5 SD above or below the mean RT or IKI for each participant were excluded. In total, 2.3% and 2.7% of data were excluded for RT and IKI analyses, respectively. Accuracy rates, RTs, and IKIs were compared using nonparametric statistical tests (Wilcoxon signed-rank test) that are robust against the assumptions of parametric tests.
“Error awareness” was determined based on the metacognitive judgment in the delayed-feedback condition (pressing the “correct” button indicated that the participant had not detected a response as an error, whereas pressing the “incorrect” button indicated that they had detected their response as an error). “Error correction” was indexed by the use of backspace (which must accompany all corrections in typing regardless of the accuracy of the outcome). To quantify monitoring performance, we used the framework of the signal detection theory (Galvin, Podd, Drga, & Whitmore, 2003; Clarke, Birdsall, & Tanner, 1959), which has also been successfully applied to explaining both monitoring and selection processes in language production (Nozari & Hepner, 2018; Nozari et al., 2011). In particular, signal detection theory Type 2 is used for metacognitive decision about one's own performance. Combining the actual accuracy of the response (correct or error trial) with the accuracy of the metacognitive judgments (error detected or not) gives four categories of trials: correct rejection (correct trials judged as correct), hit (error trials judged as error), miss (error trials judged as correct), and false alarm (correct trials judged as error). An efficient monitor is one that has a high hit rate and a low false alarm rate. To capture monitoring performance uncontaminated by response bias (i.e., tendency toward answering “correct” leading to more misses and fewer false alarms or vice versa), we computed d′ as z(hit rate) − z(false alarm rate), where z is the z score of the cumulative normal distribution for the two response types (Macmillan & Creelman, 1991). A lower d′ indicates poorer monitoring behavior.1
Analyses focused on indices of monitoring, namely, the ERN, Pe, FRN, and FCP. We started with broad predetermined windows based on previous studies and refined them using visual observation and global field power (GFP). GFP is an index of the spatial standard deviation of the data and corresponds to the root mean square of the grand average over all electrodes. GFP is used to describe the general time course of the signal, extract peaks of interest, and compare conditions (e.g., Python, Fargier, & Laganaro, 2018). Once a GFP peak has been identified, nonparametric tests (Wilcoxon signed-rank test, unless otherwise specified) were performed on GFP amplitudes averaged over a 40-msec time window (unless otherwise specified) centered on the peak. Our approach was very similar to an omnibus ANOVA: If an effect was significant on the GFP amplitudes, then further tests, or “contrasts,” were run on all electrodes to uncover the spatial distribution of the effect. To do so, mean ERP amplitudes of all electrodes in the time window where GFP amplitudes displayed a significant effect were compared with permutation Wilcoxon tests. However, if no peak could be readily identified on GFP, ERP amplitudes were compared point-by-point over the whole time window. If there was only one condition (e.g., around each keystroke), one-sample t tests against zero were used. Unless stated otherwise, a threshold of p < .05 was taken for all statistical tests. To provide a measure of the consistency of the effect across participants, we plotted the proportion of participants that presented a difference between conditions in the same direction as the grand averages for each electrode and time point and reported the maximum percentage (Rousselet, Foxe, & Bolam, 2016).2
Participants' ability to type fast and accurately was assessed through an online typing test consisting of copying a short text. Participants that typed faster than 60 words per minute and made less than five mistakes entered the study. After the completion of the study, one participant was excluded because he failed to reach 40% accuracy on the experimental task. On average, participants in the remaining sample typed at a typing speed of 86.4 ± 12 words per minute and made an average of 0.75 ± 0.9 errors in the typing test.
In total, 5632 responses (2816 per condition) were analyzed (see Figure 2). The accuracy rate was not significantly different in the immediate- and delayed-feedback conditions (M(immediate) = 72.5 ± 13%; M(delayed) = 76.5 ± 14%; Z = −1.53, p = .133). RTs were also not significantly different by condition (M(immediate) = 309 ± 35 msec; M(delayed) = 326 ± 41 msec, Z = −1.5, p = .144). Typing speed as measured by mean IKIs was, however, significantly slower in the delayed- than the immediate-feedback condition (M(immediate) = 125 ± 19 msec; M(delayed) = 134 ± 16 msec; Z = 2.64, p = .006).
We then evaluated whether error correction depends on feedback availability (Table 1). Whereas in the immediate-feedback condition, 42% of errors were corrected, only 9% were corrected in the delayed-feedback condition (i.e., 79% change to correction performance; Z = 3.36, p < .001). False alarm rates were not significantly different between conditions (delayed: 1.01%, immediate: 1.32%, Z = 0.98, p = .35). Average d′ was higher in the immediate-feedback condition (2.06 ± 0.73) compared with the delayed-feedback condition (0.89 ± 0.37, Z = 3.34, p < .001). Collectively, these results indicate substantially poorer correction performance in the absence of external feedback. This may mean that the error was not detected or, alternatively, that the error was detected but not corrected. To disentangle the two, we looked at metacognitive judgments, which index error awareness independently of error correction. Results showed that, in the delayed-feedback condition, 54% of errors were reported in the metacognitive judgments, significantly higher than the 9% that were corrected, Z = 3.52, p < .001 (Table 1). This finding indicates that many more errors in the delayed-feedback condition were detected than corrected.
|.||Not Corrected/Reported .||Corrected/Reported .|
|Correct||Correct rejection||False alarm|
|.||Correction .||Awareness .|
|Immediate .||Delayed .||Delayed .|
|No correction .||Correction .||No correction .||Correction .||No detection .||Detection .|
|Correct||2043 (99%)||11 (1%)||2153 (99.6%)||8 (0.4%)||2065 (96%)||96 (4%)|
|Error||408 (54%)||354 (46%)||593 (91%)||62 (9%)||300 (46%)||355 (54%)|
|.||Not Corrected/Reported .||Corrected/Reported .|
|Correct||Correct rejection||False alarm|
|.||Correction .||Awareness .|
|Immediate .||Delayed .||Delayed .|
|No correction .||Correction .||No correction .||Correction .||No detection .||Detection .|
|Correct||2043 (99%)||11 (1%)||2153 (99.6%)||8 (0.4%)||2065 (96%)||96 (4%)|
|Error||408 (54%)||354 (46%)||593 (91%)||62 (9%)||300 (46%)||355 (54%)|
The top panel specifies what data appear in each cell.
Careful artifact rejection led to the removal on average of 20.7 ± 11% of trials over all time windows of analysis.
ERN and Pe
Based on the previous literature, we looked for an ERN from −50 to 100 msec and for a Pe around 200–400 msec (e.g., Acheson & Hagoort, 2014; Ullsperger, Fischer, Nigbur, & Endrass, 2014) time-locked to each keystroke. Time-locking to each keystroke poses a challenge because keystrokes are separated by only 130 msec on average; thus, activity related to one keystroke may influence the activity related to neighboring ones. The only other study that attempted to analyze electrophysiological signal time-locked to individual keystrokes during continuous typing (Kalfaoğlu et al., 2018) did so on ICA components. To avoid the problems with this approach discussed in the earlier sections, we analyzed the full signal without assuming specific components.
Amplitudes were sensitive to the position of the keystroke within the word and increased steadily throughout the word. Therefore, we normalized the ERP signal extracted around error keystrokes. At each keystroke, we subtracted the averaged signal for correct keystrokes from the averaged signal for error keystrokes and then averaged the normalized signal over all keystroke positions. The resulting normalized signal time-locked to erroneous keystrokes, but independent of keystroke position, is displayed in Figure 3. Because the absence of immediate visual feedback may change processing even on correct trials, we did not directly compare the two feedback conditions.
In the immediate-feedback condition, we were not able to identify a clear negative component resembling an ERN in the predefined time window (Figure 3A, B). There was, however, a centroparietal positive component similar to a Pe, starting around 150 msec postkeystroke (mean Cz amplitude = 2.4 ± 3.6 μV [150–250 msec]; mean Pz amplitude = 3.2 ± 4.1 μV [300–500 msec]; see Figure 3A). Consistent with the previous literature, the effect developed centrally first (Cz) and then appeared over parietal regions (Pz; see Figure 3A). Point-by-point one-sample t tests against zero on the ERP amplitudes confirmed an effect from 150 to 400 msec, with perfect consistency across participants (%max = 100%) and widespread over the central electrodes (Figure 3B).
In the delayed-feedback condition, we observed a negative frontocentral component similar to an ERN locked to error keystrokes (mean FCz amplitude = −1.2 ± 2.7 μV [0–100 msec]; Figure 3C, D). Point-by-point one-sample t tests against zero on the ERP amplitudes confirmed the significance of the frontocentral ERN, in a time window from −20 to 100 msec, consistent across participants (%max = 100%), focal in time, and centered around the frontal electrodes (Figure 3D). No clear positive component could be identified following the ERN at the group level, but because the Pe is considered an index of conscious awareness of errors, we further divided errors into reported (i.e., hits, 52.7% of trials) and unreported (i.e., misses, 47.3% of trials) based on the metacognitive judgments (Figure 3E). Consistent with the theories of the Pe, reported errors (hits) were associated with a centroparietal component similar to a Pe (mean Cz amplitude = 1.6 ± 6.6 μV [150–250 msec]). Unreported errors (misses), on the other hand, showed no clear Pe (mean Cz amplitude = −0.53 ± 3.8 μV [150–250 msec]). However, misses showed a negative frontocentral component similar to an ERN (mean FCz amplitude = −2.0 ± 4.1 μV [0–100 msec]). The contrast between hits and misses was significant in the Pe time window (150–350 msec), over central left electrodes, but not in the ERN time window.
In summary, we could identify a clear Pe but no clear ERN in the immediate-feedback condition. In the delayed-feedback condition, on the other hand, an ERN was observed, but no clear Pe could be identified when all trials were pooled together. However, breaking down errors by awareness revealed a Pe for hits (errors that participants had consciously detected), but not for misses (those that were not consciously detected).
FRN and FCP
A clear separation of the external feedback monitoring from the internal monitoring processes is only possible in the delayed-feedback condition. We focused the analyses on the four types of responses obtained in the metacognitive judgment task. Based on the previous literature on visual feedback processing, we chose a predefined time window of 200–350 msec for the FRN (Krigolson, 2018) and a predefined time window of 500–700 msec for the FCP (e.g., Arbel & Wu, 2016). Figure 4 shows the results. All four trial types presented an early positive component first, peaking at 220 msec (mean FCz amplitude = 8.7 ± 4.2 μV [correct rejections]; 10.0 ± 6.6 μV [hits]; 11.2 ± 7.0 μV [misses]; 13.6 ± 8.6 μV [false alarms]; see Figure 4A, C). Hits and misses presented a negative component, similar to an FRN peaking at 300 msec (mean FCz amplitude = 8.3 ± 6.7 μV [hits]; 8.5 ± 8.1 μV [misses]; see Figure 4A). In contrast, no such negative component could be identified for false alarms. A later long-lasting positivity from 400 msec onward was observed for misses and false alarms with a similar amplitude (mean FCz amplitude = 13.5 ± 5.8 μV [misses]; 14.6 ± 6.0 μV [false alarms]), as well as for hits albeit with a lower amplitude (mean FCz amplitude = 9.2 ± 7.3 μV).
Based on the time course observed, we considered three time windows of interest for analysis: The first window examined the positivity that was observed in our data but had not been reported. The second window reflected the FRN, and the third window reflected the FCP (Arbel & Wu, 2016; Potts, Martin, Burton, & Montague, 2006; Butterfield & Mangels, 2003). Because of the small number of false alarms that was considerably lower than all other conditions, statistical comparison of those trials against other trials is problematic. Hits and misses, however, both had enough number of trials and were comparable to each other. We thus focused the analyses on these two trial types against correct rejections and against each other, using mixed linear regressions.
In the early positivity time window (200–240 msec), statistical analyses on GFP revealed significant differences in amplitude for all conditions compared with correct rejections (hits: β = 15.9, t = 2.1, p = .038; misses: β = 22.0, t = 2.9, p = .005; false alarms: β = 37.8, t = 4.9, p < .001). Importantly, the contrast between hits and misses was not significant, β = 5.9, t = 0.79, p = .44. Permutation tests on ERP amplitudes revealed that significant contrasts with correct rejections were mostly central or frontal for hits (Figure 4B) and consistent over participants (%max = 94% [hits], 94% [misses], 93% [false alarms]; Figure 4D).
The results in the FRN time window (280–320 msec) were very similar to the earlier time window. Here too the statistical analyses on GFP revealed significant differences in amplitude for all conditions compared with correct rejections (hits: β = 21.5, t = 2.8, p = .008; misses: β = 22.0, t = 2.8, p = .007; false alarms: β = 55.1, t = 6.8, p < .001). Again, the contrast between hits and misses was not significant (β = 0.50, t = 0.07, p = .95). Permutation tests on ERP amplitudes revealed that significant contrasts with correct rejections were present over central electrodes except for misses (Figure 4B) and consistent across participants (%max = 93%; Figure 4D). The between-participant correlation of ERN amplitude measured on FCz (0–100 msec) and FRN amplitude averaged over hits and misses was positive and highly significant (R = .72, p = .002).
In the FCP time window (500–700 msec), all trial types were again significantly different from correct rejections on GFP amplitudes (hits: β = 42.5, t = 4.0, p < .001; misses: β = 63.1, t = 5.9, p < .001; false alarms: β = 74.6, t = 6.7, p < .001). In this window, however, misses largely overlapped with false alarms and were marginally different from hits (β = −20.6, t = −1.9, p = .06). Permutation tests on ERP amplitudes showed that the spatial distribution of the contrast with correct rejections were again central for the three conditions (Figure 4B) and perfectly consistent across participants (%max = 100%) from 350 up to 650 msec for hits, from 350 to 800 msec for misses, and from 200 to 800 msec for false alarms (Figure 4D).
In summary, during feedback processing, correct rejections patterned differently from all other trial types in both early and late phases of processing. Interestingly, there was a switch between the pattern of hits and misses from early to late phases of feedback processing: In earlier phases, hits and misses were largely overlapping, suggesting similar processing of all objectively incorrect responses. In later stages, however, hits were processed differently from misses, the latter of which patterned closely with false alarms, suggesting a closer grouping of trials based on monitoring performance, as opposed to response accuracy.
Comparison between typing with and without visual feedback showed that correction rates were significantly lower when external feedback was removed. Metacognitive judgments revealed that participants had actually detected many of the errors they had failed to correct in the delayed-feedback condition. Together, these findings imply a critical role for external feedback in error corrections. Similar to findings in spoken production (e.g., Postma & Kolk, 1992), and unlike the copy-typing task used by Snyder et al. (2015), the removal of visual feedback in our typing-to-dictation task did not increase error rates (although it did slow down typing). This difference shows the reliance of participants on visual stimuli during copy typing, an effect that is preferably to be avoided if true lexical retrieval and segmental encoding are the targeted cognitive processes. In keeping with speech studies, we found a decrease in the rate of corrections when the external channel was blocked; however, the magnitude of this decrease was much larger in typing (∼80%) compared with speaking (∼50% in Postma & Kolk, 1992; ∼17% in Oomen, Postma, & Kolk, 2001; and no significant decrease in Nooteboom & Quené, 2017). It is worth noting that a previous study reported higher correction rates than we do in a copy-typing task without visual feedback (Kalfaoğlu & Stafford, 2014), but the task did not include a temporal deadline and instructions specifically emphasized that participants must correct all their errors. Despite this, a correction rate of only 60% was achieved, which could reflect the upper bound of corrections that can be achieved in the absence of the external channel.
The different rates of change in correction rates between spoken and typed production with and without external feedback may reflect the differential contribution of internal versus external channel to monitoring in these two modalities. However, when artificial errors were inserted in spoken and typed production (Lind et al., 2014; Logan & Crump, 2010), they elicited similar degrees of acceptance, pointing to comparable reliance of participants on the external channel in the two production modalities. Given this finding, the larger drop in correction rates in typing compared with speaking likely reflects the fact that noise masking only partially blocks the external channel during speaking, and removal of that channel may be associated with greater decrease in correction rates in spoken modality as well. Another possibility is that there is a greater need for the kind of information provided by the external channel for correcting errors in typing compared with speaking. Whereas correcting spoken words almost always entails restarting the word, correcting typed errors rarely entails erasing and retyping the whole word (see Appendix). Thus, information about the identity of the previous letter or the location of the error may be crucial for applying the correct correction. The role of these two factors in efficient corrections should be further explored in future studies.
Interestingly, the much lower correction rates in the absence of visual feedback were “not” accompanied by poor awareness over errors. Participants consciously detected about half of their errors, a rate comparable to previous studies in speech monitoring (Hartsuiker et al., 2005; Nooteboom, 2005; Postma & Noordanus, 1996), and comparable to the rate estimated by the conflict-based model of Nozari et al. (2011). This finding shows that (a) error awareness is not sufficient for correction and (b) the role of the external channel is much less prominent in error awareness than in error correction.
To summarize, the behavioral findings of this study showed similar patterns in monitoring of spoken and typed errors, extending previous results that demonstrated the similarities between the error patterns in the two modalities (Pinet & Nozari, 2018). Importantly, the external channel seems to play an important role in repairing errors, but this role is not tied to conscious detection of errors.
Connection to Models of Typing
The first implementation of a typing model was carried out by Rumelhart and Norman (1982), which consisted of word schema, keystroke schemata, and a response system. Once the word schema is activated, the model simulates the process of activating its corresponding keystroke schemata and ultimately the target positions on the keyboard. This model was the first model to successfully capture the basic properties of typing such as speed and accuracy in skilled typists. A more recent and nuanced model of typing, the hierarchical processing model, was developed by Logan and Crump (2011) to capture an interesting dissociation observed in Logan and Crump (2010). As described in the Introduction, that study artificially altered a proportion of typed words. The first finding was that typists were willing to accept some of the inserted errors as their original responses, pointing to an influence of the external monitoring channel on monitoring behavior. The second finding was a typical slowing down after self-made errors, which was absent for inserted errors that were presumably processed through the external (visual) monitoring channel. This finding suggested reliance on the internal monitoring channel.
Logan and Crump (2011) formalized this dissociation by proposing a model with two informationally encapsulated loops: The “outer loop” is responsible for visually processing the input in copy-typing, encoding the word to type, and processing the visual feedback. The “inner loop” is responsible for the inner dynamics of typing, including mapping the word onto segments, controlling the sequential activation of keystrokes, navigating the position of fingers on the keyboard, and processing the proprioceptive feedback (see also Yamaguchi, Crump, & Logan, 2013). There are clear parallels between the internal and external monitoring channels that frame the backbone of the current study and Logan and Crump's (2011) inner and outer loops. They are, however, not identical.
One difference is the encapsulation of information between the two loops, which clearly separates lexical and sublexical processes. Pinet and Nozari (2018) provided evidence against such separation by showing that typing errors were affected by the feedback between sublexical segments and lexical representations, similar to interactive models of spoken production (e.g., Dell, 1986). This demonstration is relevant to the current discussion in that it changes the notion of “internal channel” from being confined to the inner loop to encompassing processes that include word retrieval and the interaction between segmental and lexical activation. This, in turn, suggests that internal monitoring may be carried out by multiple mechanisms, examples of which are conflict detection (Nozari et al., 2011), or internal canceling of the activation of perceptual information by forward production commands (Hickok, 2012).
Another potential difference is that the functioning of the external monitoring channel is, to a large extent, outside the primary processes that contribute to production. For example, regardless of the modality (spoken, hand-written, or typed), language production—broadly defined—entails activating semantic information and competing lexical representations, selecting the target representation among competitors, and mapping that target onto sublexical segments. All of this, according to psycholinguistic models, is carried out within the production system and independently of the external monitoring channel. The outer loop, on the other hand, as suggested by Logan and Crump (2011), plays a critical role in planning parts of the primary process of typing. Now, this could be because the model's primary scope is copy-typing, in which the visual input plays a critical role in initiating the typing process, but our understanding is that the outer loop is also involved in carrying out the processes required to retrieving the target word, in which case, its role is clearly much more extensive that the external monitoring channel.
To summarize, the distinction between the internal/external monitoring channels finds an important parallel in the inner/outer loops in the hierarchical processing model. This parallel, however, is approximate, and the two labeling systems have differences in how they draw the boundaries that separate stages of processing.
Error-related Indices of Monitoring: ERN and Pe
The time courses of the negative and positive keystroke-locked components that we observed and their frontocentral and centroparietal topographies were consistent with the ERN and Pe components, respectively, described in the performance monitoring literature (Ullsperger et al., 2014). Although a clear ERN was uncovered in the delayed-feedback condition (see also Kalfaoğlu et al., 2018), we could not identify such a component in the immediate-feedback condition. On the other hand, a clear Pe was observed in the immediate-feedback condition when all errors were lumped together and compared against correct trials but was only evident on a subset of trials in the delayed-feedback condition in which participants were aware of the error.
The presence of an ERN in the delayed-feedback condition was expected because, despite the debates around its origin (e.g., Hewig, Coles, Trippe, Hecht, & Miltner, 2011; Ridderinkhof, van den Wildenberg, Segalowitz, & Carter, 2004; Yeung, Botvinick, & Cohen, 2004), the ERN has often been implicated as an index of internal monitoring, which this condition taps into. The absence of a clear ERN in the immediate-feedback condition was not predicted a priori, although some modulation of ERN was expected because both the internal and the external channels are available in the immediate-feedback condition. In past studies, the absence of an ERN has been reported when the stimulus could not be analyzed due to subliminal presentation (Charles et al., 2013) or when no correct response was available (Di Gregorio, Maier, & Steinhauser, 2018). Neither of these applies to the current experiment. Likewise, there are two factors that have been reported to increase the amplitude of the ERN: conscious detection of errors (Hughes & Yeung, 2011; Maier, Steinhauser, & Hübner, 2008) and higher likelihood of corrections (Gehring et al., 1993). Note, however, that error awareness is, if anything, higher in the immediate-feedback condition where participants could immediately see their errors. Similarly, the data show much higher correction rates in the immediate condition. Therefore, neither of these factors could explain the absence of an ERN in the immediate-feedback condition. Finally, a null effect may always arise from lack of statistical power, but a clear ERN was recovered from the delayed-feedback condition with a comparable number of errors, decreasing the probability of a null effect due to statistical problems. We thus cautiously interpret the differential pattern of ERN recovery between the immediate- and delayed-feedback conditions as suggesting relatively less reliance on monitoring through the internal channel when external feedback is available online during performance. Moreover, our results are much better aligned with the ERN being an index of an internal monitoring mechanism, as opposed to a mechanism that brings errors to conscious awareness or a mechanism that is primarily concerned with the availability of a correction as have been suggested in some studies (Hughes & Yeung, 2011; Maier et al., 2008; Gehring et al., 1993). It remains possible, however, that both of these latter mechanisms could be indirectly related to the dynamics of internal monitoring.
One study before this has explicitly investigated the Pe component in language production. In a copy-typing task, Kalfaoğlu et al. (2018) reported a Pe immediately following ERN on the same ICA component and with the same topography as the ERN. The Pe reported in that study most likely represents the early Pe, not the late Pe typically described in the action monitoring literature. The Pe component uncovered in the current study, however, is precisely what has been described in the action monitoring literature, in terms of timeline, topography, and sensitivity to awareness (see also Ganushchak & Schiller, 2008b; Masaki et al., 2001, for hints of the same components, although not explicitly analyzed). In piano playing, for example, Maidhof, Pitkäniemi, and Tervaniemi (2013) found a Pe (200–350 msec) with a centroparietal topography, which closely matches the timeline and topography of the Pe in the current study. As described in the Introduction, the Pe has been described as an index of conscious awareness over errors in the nonlinguistic tasks (Boldt & Yeung, 2015; Steinhauser & Yeung, 2010; O'Connell et al., 2007; Overbeek, Nieuwenhuis, & Ridderinkhof, 2005). Based on this, we expected the Pe to be most prominent when participants were most likely to be aware of their errors. In keeping with this prediction, a robust Pe was found in the immediate-feedback condition, where the presence of visual feedback during typing provides a strong cue for becoming aware of an error. In the delayed-feedback condition, such a cue was absent. Correspondingly, we only observed a Pe on a subset of trials in which participants reported having been aware of errors. Compatible with our results, Ruiz, Jabusch, and Altenmüller (2009) manipulated the presence of auditory feedback during piano playing and observed that the Pe had a larger amplitude with auditory feedback than without.
Collectively, our findings regarding the ERN and the Pe confirm and extend previous claims that monitoring in language production is carried out by the same domain-general processes as other systems, albeit operating on domain-specific representations (Nozari & Novick, 2017; Hanley, Cortis, Budd, & Nozari, 2016; Nozari et al., 2011; Riès et al., 2011; Ganushchak & Schiller, 2008b). To be specific, the claim is that monitoring mechanisms are based on the operations carried out on domain-specific representations (e.g., linguistic, visual) and are thus sensitive to domain-specific properties (see Freund & Nozari, 2018, for an extension to the implementation of control). Even within a specific domain such as language, monitoring may be sensitive to differences in the nature of representations, an example of which is the double dissociation between detecting lexical and sublexical errors in individuals with aphasia (Nozari et al., 2011) or the specificity of lexical indices of monitoring for the detection of lexical (but not sublexical) errors in children (Hanley et al., 2016). In the same vein, it is likely that monitoring in typing is sensitive to the specific representations involved in typing, such as those linking segments to target positions on the keyboard (keypress schemata; Rumelhart & Norman, 1982). However, a purely domain-specific monitoring system would not be expected to have the exact same electrophysiological signature across different domains. The report of such a common signature as the ERN in verbal and nonverbal tasks in the past studies (e.g., Riès et al., 2011), complemented by the more comprehensive report of several such signatures (ERN, Pe, FRN, and FCP) in the current article, argues for a central domain-general component to monitoring mechanisms across various domains, thus ruling out purely domain-specific monitoring accounts.
The dissociations observed in the patterns of ERN and Pe have further implications for monitoring via the internal and external channels: When the external channel is removed, monitoring relies heavily on the internal channel. This generates a clear ERN, but overall lower confidence in having made an error, because no external cues are available to bring errors to conscious attention. When, on the other hand, the external channel is available, it is relied upon by participants for monitoring. This, in turn, decreases the reliance on the internal channel (no clear ERN) but increases error awareness through external cues (clear Pe).
Feedback-related Indices of Monitoring: FRN and FCP
The delayed-feedback condition provides the ideal context for examining feedback-related ERPs, as visual feedback is temporally separated from the operation of internal monitoring mechanisms. Moreover, this separation allows us to understand how the system integrates the output of internal monitoring channel with the external channel. We were able to uncover both a reliable FRN and a reliable FCP in this experiment. This is, to our knowledge, the first report of such components in the context of language production. The most striking finding, however, was the discovery of two phases of feedback processing and integration. In the early phase (first and second windows of analysis), that is, the FRN time window, we did not find a difference in the amplitude of the FRN between hits and misses, suggesting that both detected and undetected errors were treated similarly at this stage and differently from the majority of correct responses. It is worth mentioning that we found the same pattern that differentiates between error and correct trials in an earlier time window, which has sometimes been observed, but has not been explicitly reported (Glazer, Kelley, Pornpattananangkul, Mittal, & Nusslock, 2018; Eppinger et al., 2008; Butterfield & Mangels, 2003). The patterning of hits and misses together may suggest that the first phase of feedback processing concerns the objective status of response (correct/incorrect), with incorrect responses eliciting a different brain response than correct responses.
The above interpretation, however, would predict that false alarms (which are objectively correct responses) would pattern with correct rejections. Contradictory to this prediction, ERPs associated with false alarms showed the greatest positivity of all response types. The problem is that false alarms are rare events in this task, and the low number of such trials does not allow for robust comparisons and interpretations. A conservative interpretation of the results that would accommodate this pattern of false alarms would be that the first phase of feedback processing simply reflects the system's sensitivity to response probabilities. Correct rejections are the most common response type, followed by hits and misses that are about equal, with false alarms being the rarest type of response. Experiments with designs that elicit greater rates of false alarms should distinguish between the two interpretations discussed above. The critical point, however, is that, in this early phase, the system does not distinguish between hits and misses and is thus not reflecting the quality of monitoring performance.
In contrast to the early phase, the late phase (the third window of analysis), which falls within the window of the FCP, clearly distinguished between hits and misses and categorized the latter with false alarms. This pattern shows that the output of internal and external channels has been fully integrated at this point in order for the system to best adjust itself for performance optimization on future trials: The least positive are correct rejections, which require no adjustments. The next most positive are hits, which require adjustments to the primary production processes, but not to monitoring processes (hits indicate good monitoring). The most positive are misses and false alarms that are failures of the monitoring system and require a different type of adjustment than hits, such as adjustments to the criterion for detecting an error (Nozari & Hepner, 2018). This pattern of results is compatible with FCP indicating the need for adjustments (i.e., learning) in the system and is compatible with studies in the nonlinguistic literature that have linked a higher FCP amplitude to better learning outcomes (Glazer et al., 2018; Arbel & Wu, 2016; Arbel et al., 2013; Mangels, Butterfield, Lamb, Good, & Dweck, 2006; Butterfield & Mangels, 2003).
Finally, we found a strong positive correlation between the amplitude of the ERN and FRN in our study. There is an ongoing debate on whether the ERN and FRN arise from the same or different systems (Gentsch, Ullsperger, & Ullsperger, 2009; Müller, Möller, Rodríguez-Fornells, & Münte, 2005; Nieuwenhuis, Yeung, Holroyd, Schurger, & Cohen, 2004). Potts, Martin, Kamp, and Donchin (2011) showed that the spatial decomposition of both components yielded highly correlated factors, suggesting that they may share a common source in ACC, although the FRN had a more anterior distribution than the ERN, pointing to the possible existence of an additional frontal source. The finding of a positive correlation between the two components in our study is compatible with a common element between the two. Importantly, it shows that in tasks that are well learned, for example, typing of known words, a stronger index of internal monitoring goes hand in hand with a stronger index of feedback monitoring, suggesting that people who monitor their production more closely use both resources to do so. This is in contrast with the findings in learning a new task, in which the magnitude of ERN gradually increases as the magnitude of FRN decreases, suggesting a shift toward internal monitoring (e.g., Eppinger et al., 2008; Nieuwenhuis et al., 2002).
To summarize, the results of our FRN and FCP analyses, similar to the ERN and Pe analyses, point to domain-general processes underlying monitoring language production. They further show that participants use the information from external feedback to check the output of the internal monitoring channel and integrate the two, possibly to guide adjustments in the primary production and monitoring processes to make the system more efficient for future use.
Our study is the first to report both error-related (ERN, Pe) and feedback-related (FRN, FCP) domain-general indices of monitoring in a task involving language production. The removal of visual feedback prompted a stronger reliance on internal monitoring processes, which resulted in lower correction rates, a clear ERN, and lower confidence in error detection. Indices of conscious error detection (Pe) were only observed when confidence in having made an error was high or external feedback was available. Feedback-related indices further showed that external and internal channel outputs are integrated, most likely to drive adjustments to the production and monitoring systems, and optimize future performance.
We coded errors according to their error types, namely, addition, deletion, and substitution of a letter (see Table A1). Only the first error of the word was classified.
|Target .||tank .|
|Target .||tank .|
Metacognitive Reports and Corrections by Error Types
Note that for the purpose of the current experiment (i.e., analyzing EEG data), only the first error in the word has been coded. Therefore, these data are not representative of the proportion of all errors in the sample.
|.||Corrections .||Reports .|
|Immediate .||Delayed .||Delayed .|
|.||Corrections .||Reports .|
|Immediate .||Delayed .||Delayed .|
Correction by Distance from the Error
In Figure A1, we see that about 75% of corrections are implemented right after or one keystroke after the incorrect keystroke (similarly in both conditions).
Major support was provided by the Therapeutic Cognitive Neurology Fund, with additional support from the Therapeutic Cognitive Neuroscience Professorship; the Benjamin and Adith Miller Family Endowment for Aging, Alzheimer's, and Autism; the Binder Family Charitable Fund; the Murren Family Foundation; the Wockenfuss Endowment and Fund; and anonymous donors.
Reprint requests should be sent to Svetlana Pinet, Department of Neurology, Johns Hopkins University, 1629 Thames Street, Suite 350, Baltimore, MD 21231 or via e-mail: firstname.lastname@example.org.
Data and analysis scripts are available at https://doi.org/10.17605/OSF.IO/7EBR4.
Data and analysis scripts are available at https://figshare.com/projects/Electrophysiological_correlates_of_monitoring_in_typing_with_and_without_visual_feedback/69770.