The perceptual system integrates synchronized auditory–visual signals in part to promote individuation of objects in cluttered environments. The processing of auditory–visual synchrony may more generally contribute to cognition by synchronizing internally generated multimodal signals. Reading is a prime example because the ability to synchronize internal phonological and/or lexical processing with visual orthographic processing may facilitate encoding of words and meanings. Consistent with this possibility, developmental and clinical research has suggested a link between reading performance and the ability to compare visual spatial/temporal patterns with auditory temporal patterns. Here, we provide converging behavioral and electrophysiological evidence suggesting that greater behavioral ability to judge auditory–visual synchrony (Experiment 1) and greater sensitivity of an electrophysiological marker of auditory–visual synchrony processing (Experiment 2) both predict superior reading comprehension performance, accounting for 16% and 25% of the variance, respectively. These results support the idea that the mechanisms that detect auditory–visual synchrony contribute to reading comprehension.
The perceptual system integrates synchronized auditory and visual signals (e.g., Guzman-Martinez, Ortega, Grabowecky, Mossbridge, & Suzuki, 2012; Iordanescu, Grabowecky, & Suzuki, 2011; Iordanescu, Grabowecky, Franconeri, Theeuwes, & Suzuki, 2010; Van der Burg, Cass, Olivers, Theeuwes, & Alais, 2010; Iordanescu, Guzman-Martinez, Grabowecky, & Suzuki, 2008; Van der Burg, Olivers, Bronkhorst, & Theeuwes, 2008; Smith, Grabowecky, & Suzuki, 2007; Molholm, Ritter, Javitt, & Foxe, 2004; Shimojo & Shams, 2001; Driver & Spence, 1998; Driver, 1996; Stein, Meredith, Huneycutt, & McDade, 1989), and crossmodal integration is prevalent in early sensory areas as well as in subcortical and higher cortical regions (e.g., Driver & Noesselt, 2008; Stein & Stanford, 2008; Ghazanfar & Schroeder, 2006, for reviews). This is advantageous because synchronized auditory–visual (AV) signals typically originate from a single object, so that integrating them into a common source facilitates individuation and selection of objects in a cluttered environment (e.g., Van der Burg et al., 2008, 2010; Driver, 1996).
An intriguing possibility is that AV synchrony processing mechanisms may also play a general role in cognitive processes by facilitating the synchronization of internally generated crossmodal signals. For instance, while engaging in problem solving, thought processes may be facilitated when internally generated imagery (visual, auditory, tactile, etc.) is appropriately synchronized with a train of verbal thoughts. Although such a general hypothesis is difficult to evaluate, reading provides an interesting test case as prior research has provided indirect evidence suggesting a relationship between AV synchrony processing and reading.
For example, the ability to crossmodally (in addition to unimodally) compare auditory and visual rhythms accounted for some variability in reading performance in children (Rose, Feldman, Jankowski, & Futterweit, 1999: a follow-up study to Birch & Belmont, 1964). Because Rose et al. (1999) used a combination of decoding and comprehension tests to evaluate reading performance, their result suggests that the ability to temporally align the remembered rhythm in one modality with a subsequently presented rhythm in another modality is relevant to the decoding and/or comprehension aspects of reading. Training reading-impaired children to perceptually match an auditory rhythm with a concurrently presented static visual pattern that spatially represented a rhythm from left to right improved word reading in those children (Kujala et al., 2001). This result suggests that the ability to shift visual attention in synchrony with an auditory rhythm contributes to decoding and may also contribute to semantic access because Kujala et al. (2001) used real words as the stimuli. Furthermore, irrelevant auditory stimuli interfere with visual temporal judgments across a longer duration in dyslexic adults than in nondyslexic adults (Hairston, Burdette, Flowers, Wood, & Wallace, 2005), suggesting that an abnormally wide temporal window of AV integration, potentially causing less precise processing of AV synchrony, is related to impaired processing of text in dyslexia.
To which reading-related processes might AV synchrony processing contribute? A feasible candidate is word decoding, the process of generating phonological representations in synchrony with visual orthographic processing—a process that appears to be impaired in dyslexia (e.g., Lehongre, Ramus, Villiermet, Schwartz, & Giraud, 2011; Vellutino, Fletcher, Snowling, & Scanlon, 2004; Breznitz & Misra, 2003; Breznitz, 2002; Adams, 1990). Nevertheless, crossmodal synchrony processing may also contribute to higher-order processes relevant to reading comprehension. For example, crossmodal synchrony processing may facilitate semantic access because the retrieval of word meanings may benefit from the synchronization of sensory processes with dynamic activation of multimodal memory representations (e.g., Kutas & Federmeier, 2011; Federmeier & Laszlo, 2009). Note that reading-related processes are not strictly serial in that word decoding is likely to involve semantic access (e.g., Hoover & Gough, 1990). Reading comprehension ultimately requires the integration of meanings across sentences and paragraphs to understand causal and referential relationships for generating a coherent interpretation of a story (e.g., Rapp, van den Broek, McMaster, Panayiota, & Espin, 2007; Gough & Tunmer, 1986). Because these integrative processes are limited by attention and working memory (e.g., Rapp et al., 2007; Van den Broek, Rapp, & Kendeou, 2005; King & Kutas, 1995), crossmodal synchrony processing may facilitate the timely shifting of attention and eye movements across text (e.g., to confirm causal and referential relations) and/or the timely activation of working memory and generation of mental imagery (e.g., to help integrate semantic relationships into a coherent interpretation) in coordination with the progression of phonological, syntactic, and semantic processing. Crossmodal synchrony processing may contribute to these higher-order processes by facilitating the temporal coordination of information flow across the temporo-frontal language circuit (e.g., Friederici, 2012), motor areas, and sensory cortices.
The goal of the current study was to demonstrate a direct relationship between basic sensitivity to AV synchrony and reading comprehension in typically developing adults. We evaluated sensitivity to AV synchrony using both behavioral and electrophysiological methods. To measure AV synchrony sensitivity independently of reading and linguistic abilities, we used nonlinguistic stimuli such as flashes, beeps, dynamic visual patterns, and music. To measure reading ability, we used a task that evaluated the ability to comprehend extended text. Because the current study was the first attempt at demonstrating the hypothesized direct association between AV synchrony sensitivity and reading in typically developing adults, we wished to maximize the chance of detecting such an association by using a task that required multiple processes involved in reading (word decoding, semantic access, and integration), all of which could potentially benefit from crossmodal synchronization (as discussed above). Furthermore, because the typical goal of reading is comprehension of ideas conveyed across multiple paragraphs, we sought to demonstrate the relationship between the basic processing of AV synchrony and the ability to understand an extended text. We provide converging behavioral and electrophysiological evidence demonstrating that an individual's ability to judge AV synchrony and an individual's left-lateralized electrophysiological sensitivity to AV synchrony both predict reading comprehension performance.
EXPERIMENT 1: THE ABILITY TO JUDGE AV SYNCHRONY PREDICTS TEXT COMPREHENSION PERFORMANCE
We measured the behavioral ability to judge AV synchrony using a standard synchrony judgment task (e.g., Van Eijk, Kohlrausch, Juola, & van de Par, 2008), where a visual flash and an auditory beep were presented with a range of SOAs (including both auditory-presented-first and visual-presented-first trials), and participants judged whether the auditory and visual stimuli were presented simultaneously. Perfect performance would entail responding “yes” on trials where a flash and a beep were presented synchronously and responding “no” on all other trials where a flash and a beep were presented asynchronously. In reality, participants respond “yes” on asynchronous trials when AV SOAs are small enough that the asynchrony is not detected. The function relating the proportion of “yes” responses to the AV SOA is well fit by a Gaussian curve (e.g., Van Eijk et al., 2008), and the standard deviation of the fitted Gaussian curve provides a measure of the just noticeable difference (JND), with a smaller JND indicating greater perceptual sensitivity to AV synchrony. Thus, if AV synchrony processing contributes to reading comprehension, individuals with smaller JNDs should be superior at reading comprehension.
However, even if we found this association, it might reflect individual differences in general factors such as attentiveness, effort, and/or intelligence or in the quality of temporal coding rather than individual differences in the ability to detect AV synchrony per se. We thus included a control task where participants judged the temporal order between a visual flash and an auditory beep. The stimuli in the temporal order judgment task were identical to those in the synchrony judgment task; both tasks required the processing of AV timing, except that participants focused on AV synchrony in the synchrony judgment task, whereas they focused on AV order in the temporal order judgment task.
Importantly, whereas the synchrony judgment task provides a direct measure of participants' sensitivity to AV synchrony, the temporal order judgment task forces participants to make order judgments even when the visual and auditory stimuli are perceived to be simultaneous, thus reflecting processes other than AV synchrony detection (e.g., García-Pérez & Alcalá-Quintana, 2012; Van Eijk et al., 2008). If reading comprehension is selectively associated with the sensitivity to AV synchrony, comprehension performance should be strongly associated with the JND estimated using the AV synchrony judgment task but significantly less associated with the JND estimated using the AV temporal order judgment task. Alternatively, because general task demands and required processes are similar for the two tasks (both requiring the processing of auditory and visual timing for the same set of stimuli), if reading comprehension is associated with general factors such as attentiveness, effort, intelligence, and/or the quality of temporal coding, reading comprehension performance should be equivalently associated with the JND measured with either task.
Fifty-one Northwestern University undergraduate students (aged 18–22 years) gave informed consent to participate for partial course credit or monetary compensation. All were right-handed native English speakers with normal hearing and normal or corrected-to-normal vision. They were individually tested in a dimly lit room.
Stimuli and Procedure
Participants performed three tasks in the following order: an AV synchrony judgment task, an AV temporal order judgment task, and a reading comprehension task. The AV synchrony judgment task was given first for two reasons. First, had we begun with the AV temporal order judgment task, as a result of having participated in this task, participants might be led to consider temporal order while judging AV simultaneity. Second, any incidental correlation (e.g., based on slow fluctuations in arousal) would be more likely to occur between tasks that are performed in closer temporal proximity. Because we hypothesized a selective association between AV synchrony judgment and reading comprehension, having participants perform the synchrony judgment task first, the temporal order judgment task next, and the reading comprehension task last reduced the probability that the predicted association would be observed as the result of incidental correlation. We will now describe the three tasks in detail.
The AV synchrony judgment task
Participants sat in a comfortable chair and were instructed to look at the center of a dark (4.9 cd/m2) display monitor throughout the experiment. They pressed the space bar on the computer keyboard to start the experiment. After an interval jittered between 1 and 3 sec, a bright circle (35 cd/m2 and 5.1° diameter, presented at the center of the screen) and a tone (3500 Hz, 48 dB SPL(A), presented through headphones), each lasting 10 msec, were presented at eight different values of SOA: −250, −187.5, −125, −62.5, 62.5, 125, 187.5, and 250 msec (a range typically used in AV synchrony judgment and AV temporal order judgment tasks; e.g., Van Eijk et al., 2008), where the negative values indicate that the auditory tone was presented first and the positive values indicate that the visual flash was presented first. Participants responded as to whether the visual flash and the auditory beep were simultaneous by pressing a corresponding key in a nonspeeded manner (but with a response deadline of 2.5 sec). The next AV stimulus was presented after an interval jittered between 1 and 3 sec. Each AV SOA was tested 10 times in a randomized order for a total of 80 trials. Before these experimental trials, 10 practice trials were given with SOAs randomly chosen from coarsely sampled intervals: −500, −400, −250, −100, 100, 250, 400, and 500 msec.
The proportion of “synchronous” responses should be maximal at a specific stimulus delay corresponding to the participant's AV processing delay and should monotonically decrease as the AV SOA is increased in the negative and positive directions. As described earlier, the resultant bell curve is reasonably well fit by a Gaussian function (e.g., Van Eijk et al., 2008), and the JND for the AV synchrony judgment is reflected in the standard deviation of the fitted Gaussian curve, with a smaller standard deviation (i.e., a smaller JND) indicating greater sensitivity to AV synchrony.
The AV temporal order judgment task
The stimuli and procedure were identical to those for the AV synchrony judgment task except that, on each trial, participants indicated whether the visual flash or the auditory beep was presented first. The proportion of “visual first” responses forms a sigmoidal function of stimulus delay, passing through 50% at a specific stimulus delay corresponding to the participant's AV processing delay. The sigmoidal curve is reasonably well fit by a cumulative Gaussian function (e.g., Van Eijk et al., 2008), and the JND for AV temporal order judgment is reflected in the standard deviation of the fitted cumulative Gaussian curve, with a smaller standard deviation (i.e., a smaller JND) indicating greater sensitivity to AV temporal order.
The reading comprehension task
As discussed in the Introduction section, we evaluated the comprehension of extended text, the typical goal of reading, which required multiple component processes including word (orthographic-to-phonological) decoding, semantic access, working memory, and the integration of causal and inferential relationships across texts. Our extended text consisted of the first 1182 words of the first chapter of Doctor Pascal by Emile Zola (Lexile score = 1170). We chose this text because it is both conceptually rich and sufficiently unfamiliar; none of our participants had previously read the text. To minimize the potential effects of eye movements on reading (e.g., Henderson & Luke, 2014) and to control for individual differences in reading speed, we presented the extended text one word at a time (with accompanying punctuation marks) at the center of the computer monitor at the rate of 300 msec per word (with a 200-msec interword interval), comparable with the average rate of prose reading for college students (e.g., Carver, 1992). The white (76.8 cd/m2) text in Times font was presented against a dark (4.9 cd/m2) background, with vertical visual angles ranging from 0.49° to 0.73° and horizontal visual angles ranging from 0.61° to 3.64°.
The extended text was presented twice, first in a scrambled order and then in the correct order. The primary reason for this manipulation was to verify that the participants made an effort to comprehend the story as well as to control for individual differences in the amount of effort, arousal, and carefulness (see the Results section). In both presentation conditions, participants were instructed to press the mouse button as quickly and accurately as possible every time they saw the word “and.” In the scrambled order condition, the only task was to respond to “and.” In the correct order condition, participants had to comprehend the story as well as respond to “and.” Slowing of responses to “and” in the dual-task correct order condition relative to the scrambled order condition indicates that the participants made an effort to comprehend the story, thus paying less attention to the “and”-detection task. A benefit of including the scrambled order condition before the correct order condition was that it increased the temporal distance between the initial AV synchrony judgment task and the reading comprehension task.
Reading comprehension was evaluated at the end of the correct order reading condition using a modified version of the multiple-choice questions we previously developed (see Mossbridge, Grabowecky, Paller, & Suzuki, 2013, for the full extended text and the test questions). The previous set included four questions, each accompanied by four answer choices. Participants were told that any number of choices could be correct for each question and that they should circle all choices that they thought were correct. Questions 1–3 included two correct choices, whereas Question 4 included only one correct choice. A stringent scoring method (each question was scored as correct only if all correct choices were circled, and none of the incorrect choices were circled) ensured that chance performance would result in a low score; specifically, the probability of getting each question correct by chance was only 6.7% = 1/(24 − 1), where the −1 accounts for the fact that participants knew that each question had at least one correct answer (thus, not circling any answer was not an option).
We previously used the total score from all four questions as the measure of reading comprehension (Mossbridge, Grabowecky, Paller, et al., 2013). Here, because we obtained American College Testing (ACT) reading scores (www.act.org/content/act/en/products-and-services.html) from 25 of the participants, we were able to assess the reliability of the four questions. Questions 1–3 were all moderately correlated with ACT reading score (Pearson's r ranging from .20 to .27), whereas Question 4 was not (r = .07). We thus used the average of the scores from Questions 1 through 3 as the measure of reading comprehension, which was significantly correlated with ACT reading score, r = .44, t(23) = 2.348, p < .03.
All visual stimuli were presented on a 21-in. color CRT monitor (1024 × 768 pixel resolution) at a 100-Hz refresh rate, and all auditory stimuli were presented through Sennheiser HD280 Pro (Wedemark, Germany) headphones. The experiment was controlled by a PC computer running Windows XP, using MATLAB software (The MathWorks, Inc., Natick, MA) with Psychtoolbox extensions (Version 3.10.11; Kleiner, Brainard, & Pelli, 2007; Brainard, 1997; Pelli, 1997) for the AV synchrony judgment and AV temporal order judgment tasks and using Presentation software (Version 11.0, Build 04.25.07, www.neurobs.com) for the reading comprehension task. The viewing distance was 123 cm.
The AV synchrony judgment JND (inversely related to the sensitivity for discriminating between AV synchrony and asynchrony; see the Methods section) was negatively correlated with the comprehension score, r = −.41, t(48) = 3.14, p < .003 (Figure 1; note that one participant whose data point fell outside the 95% confidence ellipses was removed from the analysis; however, statistical results were unchanged when this outlier was included). This relationship indicates that greater sensitivity for judging AV synchrony is associated with superior reading comprehension.
The specificity of this association is suggested by the lack of a similar correlation between the AV temporal order judgment JND (inversely related to the sensitivity for judging AV temporal order) and the reading comprehension score, r = −.13, t(48) = 0.90, p > .37. To confirm this specificity, we tested a multiple regression model with both the AV synchrony judgment JND and the AV temporal order judgment JND as regressors; only the AV synchrony judgment JND significantly predicted the reading comprehension score (t(47) = −2.95, p < .005, for the AV synchrony judgment JND; t(47) = 0.12, p > .90, for the AV temporal order judgment JND).
Results from the “and”-detection task provided evidence in support of the specific association between the ability to judge AV synchrony and reading comprehension performance. Overall, responses to the word “and” were significantly worse (longer RTs and higher error rates) in the dual-task correct order (comprehension) condition than in the scrambled order condition (M = 647 [SE = 11] vs. M = 578 [SE = 9] msec, t(50) = 7.79, p < .0001, for RTs; M = 0.16 [SE = 0.02] vs. M = 0.05 [SE = 0.01], t(50) = 6.06, p < .0001, for error rates), confirming that the participants made an effort to comprehend the extended text in the correct order condition.
It is reasonable to assume that participants who made more effort in comprehending the story would have incurred increased RTs and error rates in the “and”-detection task in the correct order condition (in which comprehension was required) relative to the scrambled order condition (in which comprehension was not required). Thus, individual differences in the “and”-detection RTs and error rates in the correct order condition residualized to those in the scrambled order condition provided a measure of individual differences in the amount of effort participants made to comprehend the story. If some of the individual differences in comprehension performance were accounted for by individual differences in comprehension effort, this would be captured by positive correlations between the residualized RT and/or residualized error rate and the comprehension score. However, the residualized RT was uncorrelated with the comprehension score, r = −.03 (t(48) = 0.20, p > .84), although the residualized error rate was marginally positively correlated with the comprehension score, r = .26 (t(48) = 1.86, p < .070). These low correlations suggest that the comprehension score primarily reflected an individual's reading comprehension ability rather than the amount of effort he or she made toward the reading comprehension task. Importantly, when we included the residualized error rate in addition to the AV synchrony judgment JND and AV temporal order judgment JND as regressors, only the AV synchrony judgment JND significantly predicted the comprehension score (t(46) = −2.51, p < .016, for the AV synchrony judgment JND; t(46) = 0.23, p > .81, for the AV temporal order judgment JND; t(46) = 0.84, p > .40, for the residualized error rate). Finally, the “and”-detection RTs and error rates in the scrambled order condition additionally provided measures of general arousal (reflected in faster RTs) and carefulness (reflected in reduced error). These measures were uncorrelated with the comprehension score, r = −.07 (t(48) = 0.51, p > .61) for RTs and r = −.0002 (t(48) = 0.002, p > .99) for error rates, suggesting that the individual differences in the comprehension score in this study were unlikely to have been influenced by individual differences in general arousal or carefulness.
Taken together, these results suggest that the ability to judge AV synchrony is uniquely associated with the ability to comprehend extended text because, (1) whereas the ability to judge AV synchrony predicted the comprehension score, the ability to judge AV temporal order did not; (2) the measures of the effort participants made to comprehend the text, general arousal, and carefulness did not predict the comprehension score; and (3) only the ability to judge AV synchrony predicted the comprehension score in a multiple regression model that also included the ability to judge AV temporal order and the accuracy-based measure of comprehension effort that was marginally correlated with the comprehension score. In the next experiment, we aimed to provide converging electrophysiological evidence suggesting that the sensitivity of neural mechanisms that may underlie AV synchrony detection is associated with reading comprehension.
EXPERIMENT 2: THE SENSITIVITY OF THE LEFT-FRONTAL AUDITORY STEADY-STATE RESPONSE TO AV SYNCHRONY PREDICTS TEXT COMPREHENSION PERFORMANCE
In a previous study, we identified an EEG index that reflected neural sensitivity to AV synchrony in complex stimuli (Mossbridge, Grabowecky, & Suzuki, 2013). In that study, we presented classical music with a visualizer that matched the dynamics of the music in terms of changes in luminance, color, and motion. The visualizer was presented either synchronously or asynchronously with the music. We amplitude-modulated the music at 40 Hz (which sounded like listening to the music through an electric window fan) and monitored the 40-Hz EEG component that was phase-locked to the amplitude modulation, known as auditory steady-state response or ASSR. This 40-Hz ASSR, primarily localized in frontal scalp regions, is thought to track the auditory cortical activity in response to amplitude-modulated sounds (e.g., Ross, Herdman, & Pantev, 2005; Picton, John, Dimitrijevic, & Purcell, 2003; Herdman et al., 2002; Gutschalk et al., 1999). Importantly, because the 40-Hz ASSR was phase-locked to the music, any modulation of its amplitude by AV alignment reflected the influences of AV synchrony processing on auditory-evoked neural activity. Furthermore, because the visualizer display did not contain any dynamics at 40 Hz (Mossbridge, Grabowecky, & Suzuki, 2013), the 40-Hz ASSR was uncontaminated by any visual-evoked activity; thus, the magnitude of modulation of 40-Hz ASSR by AV synchrony reflected the sensitivity of crossmodal synchrony processing that influences auditory-evoked activity. We demonstrated that the left-frontal component of the 40-Hz ASSR was significantly reduced when the music and the visualizer were played asynchronously relative to when they were played synchronously, suggesting that the left-frontal 40-Hz ASSR reflects neural sensitivity to the crossmodal alignment between auditory and visual dynamics (Mossbridge, Grabowecky, & Suzuki, 2013). If the association between the behavioral ability to judge AV synchrony and reading comprehension that was demonstrated in Experiment 1 is indicative of a general association between the mechanisms that process AV synchrony and the mechanisms that enable reading comprehension, the electrophysiological sensitivity to AV synchrony, as reflected in the left-frontal 40-Hz ASSR, should also be associated with reading comprehension. We tested this prediction in a new group of participants. They completed the reading tasks and the comprehension test first. They then experienced the synchronously and asynchronously presented music and visualizer displays, during which their 40-Hz ASSR was recorded.
Twenty-eight adults (aged 18–29 years) who responded to a poster on the Northwestern University campus (Evanston, IL) gave informed consent to participate for monetary compensation. All were right-handed native English speakers and had normal hearing and normal or corrected-to-normal vision. They were individually tested in a dimly lit room that was electrically shielded for EEG recording.
We have previously identified a left-frontal 40-Hz ASSR sensitive to AV synchrony with these same participants (Mossbridge, Grabowecky, & Suzuki, 2013; see Figure 3A). In that study, we also administered the two reading tasks, the scrambled order and correct order tasks, identical to those used in Experiment 1 as control conditions. We also gave the comprehension test that was identical to that used in Experiment 1. However, because the comprehension test was given primarily as a way to enforce comprehension effort during the correct order reading task, the scores were not analyzed. The reading tasks and the comprehension test were administered before recording the 40-Hz ASSR evoked by the synchronously and asynchronously presented music and visualizer displays (see below). Here, we analyzed the participants' comprehension scores as well as the sensitivity of their left-frontal 40-Hz ASSR to AV alignment to determine whether the left-frontal electrophysiological sensitivity to AV synchrony predicted reading comprehension performance.
Stimuli and Procedure
As in Experiment 1, the extended text (identical to that used in Experiment 1) was presented twice, once in a scrambled order and once in the correct order, while participants were instructed to respond to the word “and” as quickly and accurately as possible. Again, this manipulation allowed us to confirm that participants made an effort to comprehend the story in the correct order condition as well as to reasonably control for individual differences in the amount of effort, arousal, and carefulness. Classical music was played while participants performed these reading conditions for reasons unrelated to the current investigation (see Mossbridge, Grabowecky, & Suzuki, 2013); however, many people routinely read text with background music, so that the presence of music should not be problematic for assessing reading comprehension. The comprehension test (identical to that used in Experiment 1) was administered after the correct order condition.
After the two reading conditions (the order was counterbalanced across participants) and the comprehension test, the relevant EEG data were recorded while participants experienced synchronous and asynchronous AV presentations. The music was Beethoven's Moonlight Sonata, which was amplitude-modulated at 40 Hz (sinusoidal modulation at 100% depth) to evoke an ASSR reflecting auditory sensory activity (see below). We used the iTunes Jelly visualizer, which generated aesthetically pleasing dynamic visual displays by primarily matching changes in the dynamics of the music to changes in the luminance of the visual elements (see Mossbridge, Grabowecky, & Suzuki, 2013, for the verification of this relationship). The visualizer also changed other visual features such as color, motion, and pattern organization in synchrony with changes in the auditory intensity and pitch of the music, but these instances of AV synchronization were less apparent.
Note that the music and the corresponding visualizer displays included temporal organization across multiple timescales (e.g., rhythmic beats, faster arpeggios, and slower variations such as sweeping crescendos). The use of a complex multiscale dynamic structure allowed us to increase the chance of identifying an electrophysiological correlate of AV synchrony sensitivity without a priori knowledge of the temporal scale to which underlying mechanisms might be tuned. It also ensured that any identified electrophysiological correlate of crossmodal synchrony processing would be operative in a complex natural environment where crossmodal dynamics are defined by multiple timescales in the context of multiple concurrently varying features. A drawback of using AV stimuli with complex dynamics is that future research would be necessary to differentiate the specific contributions from individual timescales and features.
Participants listened to an identical 2-min portion of the music twice, once while watching the visualizer display presented in synchrony with the music—the AV-aligned condition—and once while watching the visualizer display presented with a 30-sec delay—the AV-misaligned condition (the condition order was counterbalanced across participants). The use of a relatively long visual delay in the AV-misaligned condition allowed us to introduce substantial asynchrony across multiple timescales contained in the complex dynamics. To avoid potential stimulus artifacts, we freshly generated a new visualizer display for the AV-aligned and AV-misaligned conditions for each participant. Thus, the visualizer display was different each time in feature-specific details (e.g., in colors, shapes, and motions) but was still dynamically synchronized with the music, so that we were able to identify the scalp pattern of ASSR that was sensitive to AV synchrony independently of stimulus-specific contributions. Video examples of the AV-aligned and AV-misaligned conditions are available online at www.dropbox.com/sh/yrlwfu96qyhum6c/73efhFYfjI. The first two videos provide examples of the AV-aligned and AV-misaligned conditions, respectively.
EEG was recorded using a 64-channel (10–20 configuration) Biosemi system with a nose reference with additional electrodes placed lateral to each eye for recording horizontal EOG activity and under the left eye for recording vertical EOG activity, including blinks. Data were sampled at 1024 Hz and band-pass filtered between 0.1 and 100 Hz. The resulting EEG waveforms were segmented into 1-sec epochs; epochs with eye blinks and muscle artifacts were manually removed based on vertical EOG activity (generally >100 μV, but adjusted for some participants as necessary), and epochs with saccades were manually removed based on horizontal EOG activity (>100 μV, but adjusted as necessary). The first 80 artifact-free epochs from each participant for each condition were transformed into current source density (CSD) maps using CSDtoolbox Version 1.1 (psychophysiology.cpmc.columbia.edu/Software/CSDtoolbox) to obtain a reference-free and high-spatial-resolution measure of EEG signals (Tenke & Kayser, 2005).
ASSR amplitude was computed (for each electrode and each participant) by averaging the CSD-transformed EEG waveforms across the 80 epochs, taking a fast Fourier transform of the average waveform (using MATLAB 7.4.0; The MathWorks, Inc.), and then extracting the amplitude of the Fourier component at 40 Hz (at 1-Hz resolution). Averaging the EEG waveforms across the 80 epochs before taking a fast Fourier transform reduced any contributions from non-phase-locked responses, thus isolating the stimulus-evoked neural responses. CSD-transformed EEG signals offer a conservative estimate of the locations of the underlying neural generator (Tenke & Kayser, 2012). Lateralized CSD-transformed EEG signals in particular reflect the activity of sources that can be reasonably assumed to be located on the same side of the brain (e.g., Bernier, Burle, Hasbroucq, & Blouin, 2009; Kayser et al., 2006; Carbonnell, Hasbroucq, Grapperon, & Vidal, 2004). Furthermore, source localization results from EEG, magnetoencephalography, and PET studies suggest that ASSR evoked by amplitude modulation at 40 Hz arises from the primary auditory cortex with additional contributions from subcortical structures and auditory association areas including the superior temporal plane (e.g., Ross et al., 2005; Picton et al., 2003; Herdman et al., 2002; Pastor et al., 2002; Gutschalk et al., 1999). Because EEG signals from subcortical structures are not lateralized on the scalp, a lateralized modulation of ASSR can be reasonably attributed to a modulation of auditory-evoked cortical activity in the same hemisphere.
We reported (Mossbridge, Grabowecky, & Suzuki, 2013) that the ASSR to the amplitude-modulated Moonlight Sonata was obtained bilaterally from the frontal scalp regions, consistent with prior results (e.g., Picton et al., 2003 [also using CSD transform]). In particular, the ASSR amplitude from the left-frontal scalp region was selectively reduced in the AV-misaligned condition relative to the AV-aligned condition (Figure 3A). We further showed that neither the degree of ASSR phase-locking to the amplitude modulation of the music (measured as intertrial phase coherence) nor the stimulus non-phase-locked oscillatory EEG activity was modulated by AV synchrony, suggesting that it is the amplitude of the left-lateralized auditory-evoked cortical response rather than its fidelity (phase locking) or other ongoing nonsensory oscillatory neural activity that is sensitive to AV synchrony (Mossbridge, Grabowecky, & Suzuki, 2013). We thus computed the difference in the left-frontal ASSR amplitude between the AV-aligned and AV-misaligned conditions for each participant as an index of electrophysiological sensitivity to AV synchrony. We determined whether a larger left-frontal ASSR index, indicative of greater neural sensitivity to AV synchrony, was associated with a higher reading comprehension score.
The extended text (one word at a time) and the visualizer display were presented on a 21-in. color CRT monitor (1024 × 768 pixel resolution) at 60-Hz refresh rate, and the amplitude-modulated music was presented through a pair of Sennheiser Pro headphones at an average level of 70 dB SPL(A). Stimulus presentations and behavioral tasks were controlled using a MacBook Pro laptop computer (OS10.6) with Presentation software (Version 11.0, Build 04.25.07, www.neurobs.com). Each participant was seated in a comfortable armchair (to reduce muscle artifacts in the EEG signals) at 120 cm from the display monitor.
The ASSRs obtained were sharply peaked at 40 Hz (the rate of amplitude modulation), indicating an appropriate level of signal-to-noise ratio (Figure 3A, top). Crucially, the magnitude of the left-frontal ASSR index (AV-aligned condition minus AV-misaligned condition), which we had previously identified to reflect the sensitivity of the left-lateralized auditory cortical mechanisms to AV synchrony (Mossbridge, Grabowecky, & Suzuki, 2013), was positively correlated with the comprehension score, r = .51, t(27) = 3.04, p < .006 (Figure 2). This indicates that greater sensitivity of the left-lateralized auditory cortical mechanisms to AV synchrony is associated with superior reading comprehension.
As in Experiment 1, results from the “and”-detection task provided evidence in support of a specific association between the neural sensitivity to AV synchrony and reading comprehension performance. To replicate Experiment 1, responses to the word “and” were overall significantly worse (longer RTs and higher error rates) in the dual-task correct order condition than in the scrambled order condition (M = 637 [SE = 17] vs. M = 573 [SE = 12] msec, t(27) = 5.37, p < .0001, for RTs; M = 0.17 [SE = 0.02] vs. M = 0.08 [SE = 0.02], t(27) = 2.97, p < .008, for error rates), confirming that the participants made an effort to comprehend the extended text in the correct order condition (also reported in Mossbridge, Grabowecky, Paller, et al., 2013). Neither our measures of comprehension effort (the correct order RT and correct order error rate residualized to their scrambled order counterparts), our measure of general arousal (the scrambled order RT), nor our measure of carefulness (the scrambled order error rate) was significantly correlated with the comprehension score (r = −.02, t(24) = 0.12, p > .90, for the residualized RT; r = −.26, t(24) = 1.30, p > .20, for the residualized error rate; r = −.29, t(24) = 1.46, p > .15, for the scrambled order RT; r = −.16, t(24) = 0.81, p > .42, for the scrambled order error rate). Thus, the comprehension score in this experiment was unlikely to have been influenced by individual differences in comprehension effort, general arousal, or carefulness. Nevertheless, because some of the nonsignificant correlation coefficients were moderate, we tested a multiple regression model including all these variables as regressors; only the left-frontal ASSR index significantly predicted the comprehension score (t(20) = 2.58, p < .019, for the left-frontal ASSR index; t(20) = 0.76, p > .45, for the residualized RT; t(20) = −0.85, p > .40, for the residualized error rate; t(20) = −0.09, p > .92, for the scrambled order RT; t(20) = −0.74, p > .46, for the scrambled order error rate). Note that two outliers were removed in the correlation analyses involving the “and”-detection performance so that all data points for all examined correlations remained within the respective 95% confidence ellipses. Nevertheless, even if we included these outliers, the left-frontal ASSR index would still be the only significant predictor of the comprehension score in the multiple regression model.
We focused on the left-frontal ASSR index because our prior study (Mossbridge, Grabowecky, & Suzuki, 2013) found that the ASSR from the left-frontal scalp region was sensitive to AV synchrony (Figure 3A), whereas the ASSR from the right-frontal scalp region was not (Figure 3B). Interestingly, the ASSR index from the right-frontal scalp region was also positively correlated with the comprehension score, r = .52, t(27) = 3.16, p < .004 (Figure 2D). What this means is that the right ASSR index was not sensitive to AV synchrony in a consistent manner across participants; that is, the right-frontal ASSR was greater in the AV-aligned condition for some participants but greater in the AV-misaligned condition for others, with the right-frontal ASSR index (AV-aligned condition minus AV-misaligned condition) being largely evenly distributed in the positive and negative directions. Nevertheless, individuals with less negative or more positive values of the right-frontal ASSR index yielded higher comprehension scores. This raises the question of whether the left-frontal and right-frontal ASSR indices reflect similar or distinct AV timing mechanisms relevant to reading comprehension. Although we cannot provide a definitive answer, we present additional analyses that may be informative.
Although the reading tasks and the measurement of the 40-Hz ASSR sensitivity to music–visualizer synchrony were conducted at separate times, scalp EEG was recorded while participants performed the reading tasks in the correct and scrambled order conditions. We previously reported the analysis of the ERPs recorded during the reading tasks, time-locked to participants' responses to “and.” Specifically, the difference in a late (400–500 msec after stimulus onset) ERP component between the scrambled and correct order conditions at a midfrontal electrode was strongly associated with the comprehension score (Mossbridge, Grabowecky, Paller, et al., 2013). Because participants had to decode “and” in both the scrambled and correct order conditions, but in the correct order condition, they also had to process “and” in the context of syntactic and semantic analyses for story comprehension, the late midfrontal ERP difference likely reflected comprehension processes beyond word decoding. Thus, this ERP index allowed us to broadly assess whether the AV timing mechanisms reflected in the left-frontal and right-frontal ASSR indices are relevant to the process of word decoding (not reflected in the ERP index), the post-word-decoding processes reflected in the ERP index, or additional post-word-decoding processes not reflected in the ERP index.
We tested a multiple regression model with the ERP index, the left-frontal ASSR index, and the right-frontal ASSR index as regressors to predict the comprehension score. The model accounted for a large proportion (62%) of variance in the comprehension score, F(3, 24) = 15.42, p < .0001. Whereas the ERP index and the left-frontal ASSR index made significant contributions to the model (t(24) = 4.68, p < .0001, for the ERP index; t(24) = 2.98, p < .007, for the left-frontal ASSR index), the right-frontal ASSR index did not (t(24) = 0.91, p > .37). This result is illustrated in Figure 3E and F. Whereas the left-frontal ASSR index was strongly correlated with the comprehension score after controlling for the ERP index and the right-frontal ASSR index (Figure 2E), the right-frontal ASSR index was uncorrelated with the comprehension score after controlling for the ERP index and the left-frontal ASSR index (Figure 2F). Taken together, these results suggest that the left-frontal ASSR index, which was consistently sensitive to AV synchrony across individuals (Figure 3A), is relevant to either the process of word decoding or a component of post-word-decoding processes distinct from those reflected in the late midfrontal ERP index. The right-frontal ASSR index provides only redundant information of lesser reliability.
The perceptual system utilizes synchronized AV signals to integrate auditory and visual information belonging to the same object (e.g., Guzman-Martinez et al., 2012; Iordanescu et al., 2008, 2010, 2011; Van der Burg et al., 2008, 2010; Smith et al., 2007; Molholm et al., 2004; Shimojo & Shams, 2001; Driver & Spence, 1998; Driver, 1996; Stein et al., 1989). We considered the possibility that mechanisms that process crossmodal synchrony may also support cognition by facilitating the dynamic coordination of internal and sensory processes. We investigated reading comprehension as a case study partly because many of the component processes associated with reading (e.g., word decoding, semantic access, working memory, and semantic integration) could potentially benefit from temporally coordinated multimodal processing (see below) and partly because prior research using simple auditory and visual stimuli suggested a relationship between reading performance and the ability to compare or segregate auditory and visual dynamics (e.g., Hairston et al., 2005; Kujala et al., 2001; Rose et al., 1999; Birch & Belmont, 1964; see the Introduction section for details).
We used an individual differences approach to test the hypothesis that AV synchrony processing may play a role in reading comprehension. Behaviorally, we demonstrated that a greater ability to judge AV synchrony (but not a greater ability to judge AV temporal order) predicted superior reading comprehension performance (Figure 1). Electrophysiologically, we demonstrated that greater sensitivity of a left-frontal 40-Hz ASSR (Mossbridge, Grabowecky, & Suzuki, 2013) to AV synchrony (vs. asynchrony) predicted superior reading comprehension performance (Figure 2). These associations were sizable in that the behavioral ability to judge AV synchrony accounted for 16% and the left-frontal ASSR sensitivity to AV synchrony accounted for 25% of the variance in reading comprehension performance. The fact that robust relationships between AV synchrony processing and reading comprehension were demonstrated using both simple (flashes and beeps in Experiment 1) and complex (music and visualizer in Experiment 2) nonlinguistic stimuli suggests that general AV synchrony processing mechanisms that operate under diverse circumstances are relevant to reading comprehension.
Future research needs to elucidate how AV synchrony processing may contribute to reading comprehension. For example, we have demonstrated that both behavioral and electrophysiological sensitivities to AV synchrony are associated with reading comprehension performance. Although this provides converging evidence for the association between AV synchrony processing and reading comprehension, because the two experiments involved separate groups of participants, the current results do not inform us as to whether behavioral synchrony judgments and the left-frontal ASSR index account for a common source or distinct sources of variance in reading comprehension performance. An interesting possibility is that these explicit and implicit sensitivities to AV synchrony may contribute to different processes relevant to reading comprehension (see below).
The current results are also entirely correlational. We are reasonably certain that the obtained association is specifically between AV synchrony processing and reading comprehension because we have ruled out potential contributions from individual differences in effort, arousal, carefulness (as nonsignificant covariates), and attention (because of similar attention demands for the synchrony judgment task that is associated with reading comprehension and the temporal order judgment task that is not associated with reading comprehension). However, to demonstrate a causal effect of AV synchrony processing on reading comprehension, the operation of AV synchrony processing needs to be experimentally manipulated to see if reading comprehension is altered as predicted. For example, improving AV synchrony judgment with training (e.g., Powers, Hillock, & Wallace, 2009) should improve reading. Similarly, altering AV synchrony perception using an adaptation procedure (e.g., Roach, Heron, Whitaker, & McGraw, 2010) may temporarily interfere with reading. Along these lines, it is interesting to note that, in a longitudinal study, musical training that improves attention to AV synchrony protected low-income children from the decline in reading performance exhibited by their musically untrained peers (Slater et al., 2014).
To understand the mechanisms through which AV synchrony processing is associated with reading comprehension, it would be informative to identify the neural sources of the left-frontal 40-Hz ASSR sensitive to AV synchrony. This could be accomplished, for example, with electrocorticography using subdural surface and depth electrodes placed directly on or in the brains of patients with epilepsy. Electrocorticography allows recording of electrophysiological signals at a high signal-to-noise ratio, 1-msec temporal resolution, and 3- to 5-mm spatial resolution for localizing neural sources (e.g., Brang et al., 2015). Using this technique, one could identify brain regions that generate 40-Hz ASSR that is maximally modulated by AV synchrony and then determine in which of those region(s) the modulation is most strongly associated with reading performance. The critical neural source(s) may include left posterior STS (left pSTS); pSTS activity tends to be modulated by multisensory synchrony (see Keetels & Vroomen, 2012, for a review), and left pSTS activity in particular reflects the reliability of speech-related audiovisual signals (Nath & Beauchamp, 2011) and accounts for individual differences in speech-related AV integration (Nath & Beauchamp, 2012; Nath, Fava, & Beauchamp, 2011). The identified neural sources could also be used as the seed to constrain an EEG source modeling algorithm (e.g., Oostenveld, Fries, Maris, & Schoffelen, 2011; Scherg, 1990). Such a model would allow noninvasive assessments of the AV synchrony mechanisms that are closely associated with reading processes.
The neural sources of the association between AV synchrony processing and reading comprehension need to be interpreted in conjunction with functional considerations. To this end, it would be informative to investigate the functional sources of the association between AV synchrony processing and reading comprehension. Reading comprehension is thought to involve multiple perceptual and cognitive processes, including orthographic-to-phonological decoding (word decoding), semantic access, working memory, and higher-order processes that integrate meaning across sentences and paragraphs (e.g., Friederici, 2012; Ferstl, Neumann, Bogler, & von Cramon, 2008; Rapp et al., 2007; Thompson-Schill, Bedny, & Goldberg, 2005; Van den Broek et al., 2005; King & Kutas, 1995; Martin, Shelton, & Yaffee, 1994; Hoover & Gough, 1990; Gough & Tunmer, 1986).
AV synchrony processing might primarily facilitate word decoding by appropriately synchronizing the generation of phonological representations with orthographic processing (e.g., Lehongre et al., 2011; Vellutino et al., 2004; Breznitz & Misra, 2003; Breznitz, 2002; Adams, 1990). In fact, the analysis of the current results in conjunction with our prior results has indicated that the left-frontal ASSR sensitivity to AV synchrony is associated with reading comprehension independently of a late midfrontal ERP component that reflects comprehension processes beyond word reading (Mossbridge, Grabowecky, Paller, et al., 2013). This outcome is consistent with the possibility that AV synchrony processing contributes to orthographic-to-phonetic decoding. Nevertheless, because word decoding likely involves semantic access (e.g., Hoover & Gough, 1990), AV synchrony processing may also contribute to higher-order reading-related processes distinct from those reflected in the ERP component. In particular, it might contribute to semantic access and/or working memory by providing synchronized multimodal cues that may facilitate access to long-term memory (for semantic retrieval) and/or may enhance the activation of working memory mechanisms (e.g., Kutas & Federmeier, 2011; Federmeier & Laszlo, 2009; King & Kutas, 1995).
It would be informative to separately evaluate these component processes, such as evaluating word decoding using the phonetic decoding efficiency subset of the Test of Word Reading Efficiency (Torgensen, Wagner, & Rashotte, 1999), semantic access using the N400 ERP component (e.g., Kutas & Federmeier, 2011), and semantic working memory using a span task (e.g., Martin et al., 1994), to determine how the sensitivity to AV synchrony might be associated with these component processes. An interesting possibility is that the explicit crossmodal synchrony processing tapped by our behavioral task and the implicit processing tapped by our left-frontal ASSR index might be differentially associated with word decoding, semantic access, and/or semantic working memory. It is possible that the associations between the behavioral and/or electrophysiological AV synchrony sensitivity and extended text comprehension may persist after controlling for word decoding, semantic access, and semantic working memory. Such an outcome would suggest that crossmodal synchrony processing benefits higher-order processes that integrate causal and referential relationships to construct coherent interpretations (e.g., Rapp et al., 2007; Van den Broek et al., 2005; Gough & Tunmer, 1986), potentially through appropriately synchronizing the generation of mental imagery, attention shifts, and/or eye movements with the progression of semantic integration across the extended text.
Whereas the current study focused on reading comprehension, there is evidence suggesting a close relationship between language comprehension and production (e.g., Humphreys & Gennari, 2014; Gennari & MacDonald, 2009). Thus, greater sensitivity to AV synchrony might benefit language production as well as comprehension. Note that pFC, including the left inferior frontal gyrus, appears to be commonly recruited for tasks requiring language comprehension, production, working memory, and inhibitory control (e.g., Humphreys & Gennari, 2014; Thompson-Schill et al., 2005). This raises the intriguing, albeit highly speculative, possibility that the mechanisms underlying the detection of crossmodal synchrony might play a general role in temporally coordinating sensory, attention, working memory, and inhibitory processes.
In summary, the current results provide converging behavioral and electrophysiological evidence suggesting that sensitivity to AV synchrony (vs. asynchrony) is associated with reading comprehension performance. These results suggest avenues for future research to investigate the causality as well as the neural and functional sources of this association.
This study was supported by a National Institutes of Health grant (R01 EYO21184).
Reprint requests should be sent to Satoru Suzuki, Department of Psychology, Northwestern University, 2029 Sheridan Rd., Evanston, IL 60208, or via e-mail: email@example.com.