Can prediction error explain predictability effects on the N1 during picture-word verification?

Abstract Do early effects of predictability in visual word recognition reflect prediction error? Electrophysiological research investigating word processing has demonstrated predictability effects in the N1, or first negative component of the event-related potential (ERP). However, findings regarding the magnitude of effects and potential interactions of predictability with lexical variables have been inconsistent. Moreover, past studies have typically used categorical designs with relatively small samples and relied on by-participant analyses. Nevertheless, reports have generally shown that predicted words elicit less negative-going (i.e., lower amplitude) N1s, a pattern consistent with a simple predictive coding account. In our preregistered study, we tested this account via the interaction between prediction magnitude and certainty. A picture-word verification paradigm was implemented in which pictures were followed by tightly matched picture-congruent or picture-incongruent written nouns. The predictability of target (picture-congruent) nouns was manipulated continuously based on norms of association between a picture and its name. ERPs from 68 participants revealed a pattern of effects opposite to that expected under a simple predictive coding framework.

. The N1 is also sometimes referred to as the N170 due to the 51 timing of its peak in some studies, at around 170 ms. This typically occipitotemporal, 52 negative-going component shows reliable differences between orthographic and to high predictable words, were more positive-going over left-hemisphere sites, but more 166 negative-going over right-hemisphere sites. In sum, while these studies using sentential 167 contexts have reported predictability effects in the N1 window, it is clear that the timing 168 and topography of effects, as well as interactions with frequency, have been inconsistent. Some studies analysed two N1 windows (e.g., onset and offset). N1 windows reported to show a predictability effect are highlighted in black, while N1 windows that failed to show a predictability effect are highlighted in grey. Studies are listed in order of their mention in our review. For reference, the blue region displays the N1 period that we pre-registered.
Instead of manipulating error precision or certainty as the above studies have by were presented in contexts that were acutely predictive of the target (M Cloze =.90).
semantic decision (i.e., word vs. person's name) or silent word reading. In a French study 232 examining word frequency effects across different go/no-go tasks, Strijkers et al. (2015) 233 similarly reported that ERP amplitude in a period including the N1 (150-250 ms) was 234 more sensitive to word frequency (with more negative amplitudes for higher frequency 235 words) during a semantic categorisation (i.e., animal vs. non-animal) than a colour 236 categorisation (i.e., blue vs. non-blue) task. Wang  greater N1 offset amplitudes than familiar Chinese characters, was greater when 257 participants were led to expect Chinese characters.

258
. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint In addition to task manipulations, non-sentential semantic contexts, leading to 259 predictions for specific words or categories of words, have also been used to investigate 260 predictive processing. In an ERP study, Segalowitz and Zheng (2009) presented words and 261 pseudowords for lexical decisions in two conditions: words were either drawn from a single 262 category (e.g., animals), or from five different semantic categories. Segalowitz and Zheng 263 reported an interaction between stimulus type (word vs. pseudoword) and expectation (one  Zheng, an early sensitivity to category relevance during the N1 which, given the N1's 272 robust sensitivity to orthography, is likely to reflect an influence of semantic-level 273 predictions on orthographic processing. 274 In another attempt to modulate top-down expectancy without linguistic context, 275 Dikker and Pylkkanen (2011) implemented a picture-noun phrase verification task. An 276 image of a target object alone or an image of objects related to the target object was 277 followed by a written noun phrase (article + noun) denoting the target object. They 278 manipulated congruency and predictability. For congruent trials, the noun phrase referred 279 to a food/drink or animal (e.g., the apple or the monkey) that matched the prior image of 280 the object presented on its own or 'contained' in a stylized image (e.g., a grocery bag or 281 Noah's Ark, respectively). In the incongruent condition, the noun phrase did not match 282 the prior image (single object or collection of objects). Predictability was considered high 283 when the target object appeared on its own, and was considered low when the target object 284 could be inferred to exist within the stylized images. Example conditions for the noun 285 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made of the timing of such effects -its coarse temporal resolution means that mapping of picture 311 content to representations in vOT could occur so late as to be irrelevant to initial 312 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint orthographic word recognition processes.

313
One advantage of paradigms like picture-word verification tasks is that the 314 researcher can control and manipulate variables like predictability and specificity of the In the present study, we adapted the picture-word verification paradigm to examine 324 the role of Predictability in prediction effects on the N1. We presented participants with 325 PICTURE-word pairs that were congruent (e.g., ONION-onion) or incongruent (e.g.,

326
ONION-torch). Predictability of the congruent word was a continuous variable, dependent 327 upon how often the noun is reliably used in naming the picture (Figure 2). By 328 manipulating both Congruency and Predictability of word forms, we were able to examine 329 whether the effect of Congruency on the N1 (sensitivity to prediction error) is contingent 330 on Predictability (certainty or precision of prediction errors), in the manner expected 331 according to a simple predictive coding account of the N1 in which observed N1 magnitude 332 indexes prediction error. 333 We hypothesised, consistent with such a predictive coding account, that that there 334 would be a Congruency-Predictability interaction in which at the highest levels of 335 Predictability, N1s elicited by picture-incongruent words would be more negative-going 336 than those elicited by picture-congruent words, while at the lowest level of Predictability 337 picture-congruent and -incongruent words should elicit N1s of similar magnitude. We 338 anticipated three patterns of results that would have been consistent with this hypothesis:

339
. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint N Images PICTURE-word pairs were either congruent (e.g., NAPKIN-napkin) or incongruent (e.g., NAPKIN-weasel), while predictability of congruent picture-word pairs varied continuously. Ten example picture-congruent and -incongruent pairs are presented, with their predictability corresponding to the histogram bin they appear above.
(1) higher levels of Predictability lead to a reduction in N1 magnitude only for 340 picture-congruent words, with no such effect for picture-incongruent words (Figure 3a);

341
(2) higher levels of Predictability lead to an increase in N1 magnitude only for

346
In our power analysis, we focused on the first of these possible patterns of results, 347 but importantly, the Congruency-Predictability interaction term that we pre-registered to 348 test our hypothesis (https://osf.io/jk3r4) would capture any of these patterns, as the 349 interaction term's coefficient would be in the same direction in all cases.

350
In our analysis, we found a pattern of effects counter to our pre-registered hypothesis 351 (Figure 3d), with a Congruency-Predictability interaction in the opposite direction. An 352 exploratory Bayesian analysis revealed that the observed interaction was 59.98 times more 353 likely than our hypothesis. Based on these findings, we argue our results suggest that such 354 a simplistic predictive coding account is, at least on its own, insufficient to explain the 355 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint pattern of prediction effects observed in the N1 during a picture-word verification task.

Figure 3
A comparison between the predicted (a,b,c) and observed (d) patterns of results. The predicted pattern of results was based on a predictive coding interpretation of the N1, according to which the magnitude of the N1 should be smaller for picture-congruent words relative to picture-incongruent words, and to a greater extent as Predictability increases. The observed pattern of results depicts the fixed effect predictions from the pre-registered linear mixed-effects model, with dashed lines depicting 95% bootstrapped prediction intervals (estimated from 5,000 bootstrap samples).
This study was pre-registered at https://osf.io/jk3r4 and the reported methodology 357 and planned analysis conform to that specified in the pre-registration, except for two The experiment included two separate tasks: The principal picture-word task was 365 preceded by a localiser task to account for between-participant variability in the N1's 366 timing and location. The details of stimulus selection and control as well as presentation 367 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint timing are provided in the following sections. For clarity, we first introduce the overall 368 Congruency-Predictability design of the picture-word task. In this task, pictures of single 369 objects are presented, followed by a noun, and participants decide whether the noun   CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint were unimageable despite their high concreteness value (e.g., item). Plural words (e.g.,

423
. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint sticks) were excluded, as most images in the BOSS have modal names that are singular. 424 Finally, four images with modal names nut, trumpet, spinach, and tuba were excluded, as 425 we judged these names to be incorrect descriptions of their images. 426 To avoid repetition effects, each image was presented once, with participants 427 viewing either the associated picture-congruent or picture-incongruent word. This was 428 counterbalanced by splitting the stimuli pseudo-randomly into two equally sized stimulus 429 sets, referred to as Set 1 and Set 2. Each participant was presented with only one of these 430 stimulus sets. Pictures followed by congruent words in Set 1 were followed by incongruent  To generate stimuli for practice trials, 20 matched pairs of picture-congruent and 444 -incongruent words were generated using the same pipeline as above, except that word 445 frequency, word concreteness, and character bigram probability were not matched 446 item-wise. The practice stimuli were generated from images and words not used in the 447 experimental stimuli. The same practice trials were presented to all participants.

448
Before embarking on the electrophysiological picture-word experiment, we first ran a 449 proof-of-concept behavioural experiment using a different stimulus set generated from a 450 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint very similar pipeline. We anticipated that increased Predictability should cause faster 451 response time (RT) for congruent trials and have either no effect or a minimal effect on 452 performance for incongruent trials. The results from this behavioural validation are 453 presented in Supplementary Materials B. In short, we observed the pattern of results 454 consistent with our expectations, with Predictability leading to faster RTs for congruent 455 trials, but having almost no effect on incongruent trials.

457
The precise location of the N1, and timing of its peak amplitude, is known to vary 458 across studies and among participants. As such, we did not specify a common N1 electrode 459 or timepoint shared among all participants before data collection. Instead, we employed a 460 localiser task to identify, within an appropriate region and time period of interest, the 461 electrode and timepoint at which each participant's maximal sensitivity to orthography 462 emerges (i.e., more extreme amplitudes for words than false-font stimuli). This data could 463 then be used to extract N1 amplitudes in the picture-word task, while accounting for 464 variability among participants in timing and topography of orthographic processes. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint

Figure 5
Ten example stimuli for each stimulus type in the localiser task.
Each row represents a matched triplet of word, false-font string, and phase-shuffled word stimuli. The phase-shuffled word images were generated uniquely for each trial.    showed an offset more extreme than ±25 mV (as measured on the BioSemi acquisition 535 software, ActiView), or (2) if more than 5% of the trials were lost due to technical issues 536 with the EEG system. As no participants satisfied these criteria, no participants were 537 excluded after data collection. Data collection was approved by the Ethics Committee of 538 the institution at which the data were collected (application number: 300200117).

Figure 7
Estimated relationship between number of participants and statistical power.  PsychoPy (Peirce, 2007), and all code and materials are available in the repository 546 associated with the study. All stimuli were presented centrally (horizontally and 547 vertically). All trials in both tasks were presented in a pseudo-randomised order, such that 548 no more than five consecutive trials required the same response from the participant. Trials 549 were randomised across blocks, with the exception of the practice block, for which trials 550 were randomised within the one block.

551
A mistake in the lab setup, which we discovered after data collection, meant that 552 the display screen was running at 120 Hz rather than an expected 60 Hz. As we were 553 controlling stimulus presentation by screen refreshes, this meant that all our stimuli were 554 presented for half the expected durations. For this reason, the veridical stimulus durations 555 described here differ from those described in the pre-registration.

556
Participants started with the localiser task, in the form of a lexical decision task 557 (Figure 8a). The localiser task began with 30 practice trials, and was then followed by (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint respond. Participants were requested to respond once after the stimulus changed colour, 567 quickly and accurately, to indicate whether the stimulus they saw in each trial was either a 568 word or not a word. The stimulus remained on screen until the participant responded.

569
Responses were given with the right and left control ('Ctrl') keys of a QWERTY keyboard, 570 with the mapping of affirmative and negative responses counterbalanced across 571 participants. After the participant had responded, there was a delay of around 100 ms 572 (variable as data was saved to disk during this interval), and then the next trial began. After the localiser task, participants completed the picture-word task (Figure 8b), 574 comprising an initial practice block of 20 trials, followed by 200 trials split into 5 blocks of (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint by another interval jittered between 150 and 650 ms. The word was then presented in 580 white Courier New font, at a height of 1.5°(width 1.07°for one character). After 500 ms, 581 the word turned green, and participants could provide their response to indicate whether 582 the word described the image they saw. The word remained on screen until the participant 583 responded. As in the localiser task, responses were given with the right and left control 584 ('Ctrl') keys of a QWERTY keyboard, with the mapping of affirmative and negative 585 responses counterbalanced across participants, but kept consistent within participants 586 across the two tasks. After participants had responded, there was a delay of around 100 ms 587 (again, variable as data was saved to disk during this interval), and then the next trial 588 began. There was no deadline for participants to respond. The instructions given to 589 participants for the picture-word task are presented in Supplementary Materials E. 590 The first blocks of both tasks consisted of practice trials with 10 exemplars for each 591 stimulus type (word or false-font string or phase-shifted image, and congruent or 592 incongruent noun for the localiser and picture-word tasks, respectively), during which 593 participants were additionally given immediate feedback on their accuracy for each trial.

594
These practice trials were followed by green text reading "CORRECT!" if the participant 595 responded correctly, or else by red text reading "INCORRECT!", presented in Courier New 596 font with a height of 1.5°, for 1000 ms. Participants had self-paced breaks between blocks 597 for each task. Before the practice trials and at the start of every experimental block, 598 participants were presented with instructions for the task (available in Supplementary   599 Materials E), summarising what would occur in each trial, and specifying that they 600 should respond as quickly and accurately as possible once the stimulus turned green. These 601 instructions also specified which keys participants should press to indicate their decision.

602
After each experimental block, including the practice trials, participants were presented 603 with their average accuracy and median response time. After the practice trials, 604 participants were additionally given the option to run the practice trials again. In the 605 experimental blocks, no trial-level feedback was provided.

606
. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint Recording 607 EEG data were recorded using a 64-channel BioSemi system, sampling at 512 Hz, 608 with an online low-pass filter at the Nyquist frequency. Electrodes were positioned in the 609 standard 10-20 system locations. Four electro-oculography (EOG) electrodes were placed 610 to record eye movements and blinks: 2 were placed to the sides of eyes (on the right and 611 left outer canthi), and 2 below the eyes (on the infraorbital foramen). Electrode offset was 612 kept stable and low through the recording, within ±25 mV, as measured by the BioSemi 613 ActiView EEG acquisition tool. Electrodes whose activity exceeded this threshold were 614 recorded but were removed (and interpolated) in data preprocessing.

616
The following section details the procedure applied to EEG data from each 617 individual session, with the same pipeline being applied to both the localisation task and 618 picture-word task unless otherwise specified. EEG preprocessing was achieved using incorrectly (N =368, or .02%, in localiser task, N =226, or .02%, in picture-word). Further 622 trials were excluded if responded to later than 1500 ms after the word (or nonword) 623 changed colour (N =41, or .002%, in localiser task, N =42, or .003%, in picture-word).

624
Channels recorded as having offsets ±25 mV during data acquisition were removed 625 from the data (in both tasks, 56 channels, or 1.27%, were removed across all participants), 626 with their activity to be later interpolated. The EEG data were then re-referenced to the 627 average activity across all electrodes and filtered with a 4th order Butterworth filter 628 between .1 and 40 Hz. To counteract the distortion in signals' timing (phase) that is 629 inherent to causal filters, the filter was applied in both directions (i.e., two-pass), with the 630 MATLAB function, filtfilt(). In our pre-registration, we specified that we would apply a 631 Butterworth filter with a bandpass of .5-40 Hz. However, after the pre-registration, we 632 considered that, consistent with research into the effects of high-pass filters (Rousselet, 633 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023.  For the planned analysis, we pre-registered an approach to maximise sensitivity to 660 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023.

684
The planned analysis (pre-registered at https://osf.io/jk3r4) examined whether the 685 hypothesised effect of a Predictability-dependent reduction of N1 amplitudes for 686 picture-congruent words was observed at the electrode/timepoint in which each participant 687 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint

Figure 9
The left-lateralised occipitotemporal region of interest selected for the N1 (highlighted in red).
showed maximal sensitivity to orthography. We then present exploratory analyses, which 688 respectively examine the Bayesian probability that our data are consistent with the 689 hypothesis, and delineate the time-course of the Congruency-Predictability interaction. We The planned analysis tested the pre-registered hypothesis of a 696 Congruency-Predictability interaction in which N1 amplitudes are reduced (i.e., less 697 negative going) for picture-congruent trials than for picture-incongruent trials, and in 698 which this difference is greatest at the highest levels of predictability, and smallest at the 699 lowest levels of predictability. This was based on the notion that the N1 indexes prediction 700 error in biasing contexts. We did not find evidence in favour of this hypothesis.

701
The trial-level N1 amplitudes from the picture-word task were modelled using a CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint (1 + congruency * predictability | participant_id) +

709
(1 | word_id) CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint Congruency-Predictability interaction similarly inconsistent with our hypothesis, which we 744 derived from a simple predictive coding account of the N1.

745
. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint Exploratory Bayesian Analysis 746 We observed a Congruency-Predictability interaction in the opposite direction (i.e.,  given this posterior distribution, that the Congruency-Predictability interaction is 59.98 763 times more likely to be less than 0, than it is to be greater than zero (that is, BF 01 ), which 764 we consider to be strong evidence against our hypothesis. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint The results (Figure 12) reproduced findings from the planned analysis, with 780 increases in Predictability associated with more negative (larger) N1 amplitudes for 781 picture-congruent words, and with less negative (smaller) N1 amplitudes for 782 picture-incongruent words. The Congruency-Predictability interaction of interest remained 783 negative, and thus in the opposite direction to that hypothesised, throughout the N1.

784
The sample-level analysis additionally suggested that the difference was largest in 785 the N1's offset period (succeeding the peak). A later Congruency-Predictability interaction 786 was also observed, peaking at around 400 ms (possibly resulting from effects in the N400  Figure 13). This showed more clearly that Predictability 791 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint   . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint reduced amplitudes in the N1 for picture-incongruent words, but increased amplitudes for 792 picture-congruent words. This difference peaked around 225 ms, but reversed in direction 793 after 300 ms. It is of note that the peak of the observed effects in the N1 was later than 794 originally anticipated (the planned analysis was limited to ≤200 ms). Nevertheless, the 795 model intercept (Figure 12a) clearly shows that these effects peaked during the N1's 796 offset period. Central lines depict effect estimates, derived from sample-level models that were coded such that the model intercept lay at the respective levels of picture-word Congruency. Estimates reflect ERPs for words at the maximum level of Predictability, minus those at the minimum level of Predictability. Shaded areas depict 95% confidence intervals of model estimates.

798
In the present study, we tested whether a simple predictive coding account could (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint in the N1, that the direction of the interaction was opposite to that expected under the 806 hypothesis. Specifically, increases in Predictability were associated with greater-amplitude 807 N1s for picture-congruent words, and smaller-amplitude N1s for picture-incongruent words.

808
On this basis, we conclude that a simple predictive coding explanation of the N1 cannot 809 explain predictability effects observed in the picture-word verification task used here.

810
In recent years, predictive coding models have been increasingly applied to explain CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint to better account for these data, it would require elaboration. One feature that may be 833 relevant is the nature of the task. We elected to use a picture-word verification task as it 834 encourages explicit prediction of word forms from non-linguistic contexts. However, this 835 task paradigm may alter predictive processing of word forms in two key ways. First, 836 participants will have soon learned that the observed word form only matches its preceding CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint predictive coding mechanisms may ultimately underlie the pattern of effects we observed, 860 the simple account we have tested requires elaboration, informed by insights from other 861 paradigms, for it to explain why our current pattern of effects is opposite to that expected. 862 Nevertheless, we acknowledge the possibility that the insufficiency of predictive 863 coding accounts to explain the data we observed may reflect a more fundamental 864 shortcoming. To speculate, predictive coding models may account for activity in the N1 in 865 previously tested paradigms without accurately describing the underlying neural processes.  It is tentatively possible that the picture-word verification paradigm we applied here may 873 be a scenario that employs the same neurocognitive processes in the N1 as those employed 874 in other paradigms, but elicits cognitive dynamics whose corresponding neural activity 875 reveals differences from a predictive coding model. It is possible that processing indexed by 876 the N1 can only be explained by a model distinct from the predictive coding framework, 877 even though predictive coding models may correlate with patterns of activity seen in most 878 paradigms. Justifying the development of such a model, distinct from predictive coding, 879 would require much more evidence for the shortcomings of a predictive coding account, and 880 we do not believe our study provides the insights necessary to speculate on the form such a 881 model could take.

882
If a predictive coding account is to explain prediction-driven modulation of activity 883 in the N1, or any component, we believe it is vital for researchers to consider the 884 informational content of representations whose processing is indexed by the component 885 which is thought to capture prediction error. In a hierarchical model of predictive coding,

886
. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint where levels of the hierarchy utilise different representational formats, the interaction 887 between ascending input and descending predictions must involve some mapping of 888 higher-level onto lower-level representations. For instance, if semantic context can influence 889 processing that is closer to sensory input and indexed by early ERP components (e.g., units whose strength is determined by co-occurrence frequency. We consider this point to 912 highlight an advantage of paradigms such as ours, that use non-linguistic contexts (e.g.,

913
. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint task instructions, images, etc.) to cue upcoming words and word forms. Effects of context 914 that map across representations in this way necessitate transfer of information across levels 915 of the processing hierarchy, and may thus be considered stronger evidence for an influence 916 of top-down predictions.

917
An aspect of the predictive coding account that our design did not fully test also 918 relates to this idea of representational mapping. We dichotomised the variable of 919 congruency (prediction error magnitude), with orthographic Levenshtein distance 920 maximised between picture-congruent and -incongruent word forms. However, prediction 921 error magnitude should also be expected to vary continuously, from unpredicted word 922 forms that are less to more orthographically similar to the predicted word form. This is 923 comparable to Gagl et al.'s (2020) use of a pixel distance metric to calculate the continuous 924 distance between a presented word form and a context-neutral prior. Such an approach 925 could be applied to biasing contexts by instead calculating the orthographic distance 926 between a presented word form and a context-informed prior, where the probability of 927 observing certain pixels (or orthographic features) could be up-weighted proportional to 928 prediction certainty. We believe such an approach could provide useful insights in 929 elucidating the pattern of effects we observed.

930
In sum, we tested a simple predictive coding account of the word-elicited N1, but 931 failed to find evidence in favour of it. Exploratory analyses suggest that the pattern of 932 effects in the Congruency-Predictability interaction were in the opposite direction to that 933 expected under a simple predictive coding model. We argue that such a model is 934 insufficient to explain the pattern of effects we observed, and we have identified avenues of 935 future research that could better delineate how predictive processes interact with 936 processing during the N1.

937
. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint https://doi.org/10.3758/BF03196937 1225 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.07.552265 doi: bioRxiv preprint