Abstract

Information from different modalities is initially processed in different brain areas, yet real-world perception often requires the integration of multisensory signals into a single percept. An example is the McGurk effect, in which people viewing a speaker whose lip movements do not match the utterance perceive the spoken sounds incorrectly, hearing them as more similar to those signaled by the visual rather than the auditory input. This indicates that audiovisual integration is important for generating the phoneme percept. Here we asked when and where the audiovisual integration process occurs, providing spatial and temporal boundaries for the processes generating phoneme perception. Specifically, we wanted to separate audiovisual integration from other processes, such as simple deviance detection. Building on previous work employing ERPs, we used an oddball paradigm in which task-irrelevant audiovisually deviant stimuli were embedded in strings of non-deviant stimuli. We also recorded the event-related optical signal, an imaging method combining spatial and temporal resolution, to investigate the time course and neuroanatomical substrate of audiovisual integration. We found that audiovisual deviants elicit a short duration response in the middle/superior temporal gyrus, whereas audiovisual integration elicits a more extended response involving also inferior frontal and occipital regions. Interactions between audiovisual integration and deviance detection processes were observed in the posterior/superior temporal gyrus. These data suggest that dynamic interactions between inferior frontal cortex and sensory regions play a significant role in multimodal integration.

INTRODUCTION

Information from different sensory modalities is typically processed in separate brain areas, yet it often needs to be fused into a single percept. Audiovisual speech perception is one of the clearest examples of integrated multisensory processing. It is well established that seeing the mouth and face movements involved in producing speech can improve its intelligibility, especially in noisy environments (Gilbert, Lansing, & Garnsey, 2012; Winkler, Horvath, Weisz, & Trejo, 2009; Ross, Saint-Amour, Leavitt, Javitt, & Foxe, 2007; Sams, Möttönen, & Sihvonen, 2005; Sumby & Pollack, 1954). Audiovisual integration in speech perception is so prominent that it can sometimes lead to illusory percepts, as it is the case for the McGurk effect (McGurk & MacDonald, 1976). In the McGurk effect, people looking at a speaking face whose lip motion does not match the utterance typically hear a sound that is closer to the visual rather than the auditory input.

The McGurk effect indicates that audiovisual (AV) integration must occur before perception. Yet, there is anatomical separation between unimodal sensory processing areas and multimodal regions. This suggests that remote areas of the brain must interact before the fused percept reaches consciousness. In this article, we are asking the following question: When and where does AV integration occur within the context of the cascade of processing leading to percept formation?

We used McGurk stimuli to investigate the time course and neuroanatomical substrate of AV integration while separating it from other processes (such as attentional or response selection processes) that may also occur in multimodal paradigms. This is more easily achieved when AV integration is incidental to the task, and participants are not directly asked to report what they hear, such as in passive bisensory “oddball” paradigms (e.g., Colin, Radeau, Soquet, & Deltenre, 2004; Colin et al., 2002; Sams et al., 1991). In addition, the fact that visual information can induce changes in the auditory percept suggests that the phenomenon of AV integration occurs very early during processing. Therefore, here we concurrently recorded electrophysiological (ERPs; Fabiani, Gratton, & Federmeier, 2007) and optical data (event-related optical signal [EROS]; Gratton, Corballis, Cho, Fabiani, & Hood, 1995; see Gratton & Fabiani, 2010, for a review) in the context of a passive AV oddball paradigm to capture the timing and localization of the brain activations related to AV integration.

Electrophysiological and magnetoencephalographic studies have sometimes used McGurk stimuli in the context of passive bisensory oddball paradigms (e.g., Colin et al., 2002, 2004; Sams et al., 1991) and were the first to demonstrate that AV integration takes place within the first few hundred milliseconds of the perceptual processing stream. In these paradigms rare/deviant AV stimuli,1 for which the auditory and visual tracks provide mismatching information, are presented in the midst of frequent/standard stimuli for which the auditory and visual tracks are matching. Under these conditions, deviant and standard stimuli are perceived as different despite their identical auditory tracks and elicit the AV analogue of an MMN (avMMN; Näätänen, Paavilainen, Rinne, & Alho, 2007; Näätänen & Michie, 1979).

The MMN is an automatic brain response elicited when a stimulus (labeled “deviant”) is perceived as a violation of the regularity or expectancy established by a set of previous stimuli (labeled “standards;” Schröger, 2007; Winkler, 2007). The MMN has been widely used for investigating perceptual processes in the auditory domain (auditory MMN or aMMN) and is typically identified as a negativity occurring between 150 and 200 msec after the onset of a deviant auditory stimulus, most prominently at the Fz electrode site. MMN-like responses have also been found in other sensory modalities, including visual (Kimura, Katayama, Ohira, & Schröger, 2009; Czigler, 2007; Kimura, Katayama, & Murohashi, 2005) and somatosensory (Kekoni et al., 1997). Although the aMMN is regarded as a tool for probing auditory sensory memory (Schröger, 2007), the memory representations involved in generating the aMMN are not necessarily unisensory but can in fact be multisensory (Winkler et al., 2009; see also Butler, Foxe, Fiebelkorn, Mercier, & Molholm, 2012; Butler et al., 2011).

The passive oddball paradigm allows for the investigation of AV perceptual processes without contamination by variations in attentional load (Buchan & Munhall, 2012; Keil, Müller, Ihssen, & Weisz, 2012) or task demands (Besle, Bertrand, & Giard, 2009), differently from paradigms requiring active discrimination between matching and mismatching stimuli (e.g., Saint-Amour, De Sanctis, Molholm, Ritter, & Foxe, 2007; Skipper, van Wassenhove, Nusbaum, & Small, 2007; Skipper, Nusbaum, & Small, 2005; Callan et al., 2003; Möttönen, Krause, Tiippana, & Sams, 2002). Another problem with paradigms requiring active responses to McGurk stimuli is that they may generate response conflict conditions because of the visual–auditory mismatch. Because both conflict detection and AV speech perception activate similar regions in the frontal cortex (Koechlin, Ody, & Kouneiher, 2003), it becomes difficult to isolate AV integration processes using active classification paradigms. Here we included a series of control conditions derived from previous studies (e.g., Jacobsen & Schröger, 2001; Näätänen & Alho, 1997; Schröger & Wolff, 1996; see also Saint-Amour et al., 2007; Möttönen et al., 2002) to distinguish between AV integration and other processes (see Table 1 and Methods for details).

Table 1. 

Combined-effects and avMMN Contrasts

ACombined-effects ContrastAuditory StimuliVisual StimuliAV IntegrationAV Deviance Detection
AV deviant (McGurk) in AVB /ba/ /ga/ incongruent rare 
AV standard in AVB /ba/ /ba/ congruent frequent 
1–2 AV deviant (McGurk)–AV standard in AVB /ga-ba/ (visual deviance) incongruent–congruent rare–frequent 
V deviant in VB /–/ /ga/ na na 
V standard in VB /–/ /ba/ na na 
3–4 V deviant–V standard in VB /ga-ba/ (visual deviance) 
[1–2]–[3–4] [AV deviant–AV standard in AVB]–[V deviant–V standard in VB] incongruent–congruent rare–frequent 
 
B avMMN Contrast Auditory Stimuli Visual Stimuli AV Integration AV Deviance Detection 
AV deviant (McGurk) in AVB /ba/ /ga/ incongruent rare 
AV deviant (McGurk) in AVCB /ba/ /ga/ incongruent frequent 
1–5 AV deviant (McGurk) in AVB–AV deviant (McGurk) in AVCB rare–frequent 
ACombined-effects ContrastAuditory StimuliVisual StimuliAV IntegrationAV Deviance Detection
AV deviant (McGurk) in AVB /ba/ /ga/ incongruent rare 
AV standard in AVB /ba/ /ba/ congruent frequent 
1–2 AV deviant (McGurk)–AV standard in AVB /ga-ba/ (visual deviance) incongruent–congruent rare–frequent 
V deviant in VB /–/ /ga/ na na 
V standard in VB /–/ /ba/ na na 
3–4 V deviant–V standard in VB /ga-ba/ (visual deviance) 
[1–2]–[3–4] [AV deviant–AV standard in AVB]–[V deviant–V standard in VB] incongruent–congruent rare–frequent 
 
B avMMN Contrast Auditory Stimuli Visual Stimuli AV Integration AV Deviance Detection 
AV deviant (McGurk) in AVB /ba/ /ga/ incongruent rare 
AV deviant (McGurk) in AVCB /ba/ /ga/ incongruent frequent 
1–5 AV deviant (McGurk) in AVB–AV deviant (McGurk) in AVCB rare–frequent 

/–/ = absence of stimulus in that modality; X = effect removed by subtraction process; na = not applicable; AVB = audiovisual block; VB = visual block; AVCB = audiovisual control block.

Because the McGurk stimuli elicit an avMMN, we can hypothesize that the AV integration process generating the illusory auditory percept occurs before this fused representation is delivered to the preattentive change detector linked to the elicitation of the MMN. In other words, high-level multimodal processing must have taken place within the few hundred milliseconds that precede the avMMN. In this article, we aim at testing this hypothesis. However, the distinction between AV integration and change detection, though theoretically important, is practically difficult. One of the reasons is that these two processes appear to be based on the activation of a similar frontotemporal network. For instance, fMRI studies show that the superior temporal gyrus (STG; Szycik, Stadler, Tempelmann, & Münte, 2012; Calvert, 2001; Calvert et al., 1997), the STS (Nath & Beauchamp, 2011, 2012; Beauchamp, Nath, & Pasalar, 2010), and the left inferior frontal cortex (Broca's area; Skipper et al., 2005, 2007; Miller & D'Esposito, 2005; Callan et al., 2003) are all involved in AV integration. A similar frontotemporal network is also commonly activated in MMN studies involving only auditory stimuli (Tse, Rinne, Ng, & Penney, 2013; Tse & Penney, 2007, 2008; Tse, Tien, & Penney, 2006).

This spatial superimposition problem can be addressed by a technique that combines spatial and temporal resolution. Therefore, we recorded EROS concurrently with ERPs in an AV integration task. EROS uses near infrared (NIR) light to measure the optical changes associated with neuronal responses. It is based on the measurement of the time taken by NIR photons to diffuse from a source to a detector placed at a few centimeters apart on the surface of the head. This time changes as a result of the brain activity evoked by the presentation of a stimulus. The temporal resolution of EROS is in the tens of millisecond range and its spatial resolution is in the centimeter range. EROS has been used in the measurement of brain responses in the visual (e.g., Chiarelli, Di Vacri, Romani, & Merla, 2013; Gratton et al., 1995, 2006) and auditory modalities (e.g., Tse, Low, Fabiani, & Gratton, 2012; Sable et al., 2007; Fabiani, Low, Wee, Sable, & Gratton, 2006), as well as in preattentive MMN paradigms (e.g., Tse et al., 2006, 2013; Tse & Penney, 2007, 2008; Rinne et al., 1999). Measuring EROS in the current study affords the specificity necessary to resolve the spatial and temporal separation of frontal and temporal cortical activities elicited by the AV integration and deviance detection processes, whereas ERPs allow for the identification of critical times of interest and serve as a link to previously published work.

In summary, our aims are as follows: (1) to identify differences in the spatiotemporal dynamics of the frontotemporal network during AV integration and change detection; (2) to detect brain activities related to AV integration that occur before change detection processes; and (3) to reveal possible interactions between these processes.

METHODS

Participants

Sixteen students (nine women, mean age = 23.7 years) at the University of Illinois at Urbana-Champaign were recruited and gave informed consent to participate in the study. The experimental procedures were approved by the campus institutional review board. All participants were native English speakers, right-handed according to the Edinburgh Handedness Inventory (Oldfield, 1971), and reported themselves to have normal or corrected-to-normal vision and hearing, to be in good health, and to be free from medications that may affect the CNS.

Experimental Design

Participants were run in an AV oddball task composed of sequences of AV stimuli, including rarely occurring McGurk trials (Figure 1), which however were incidental to the participant's task. The paradigm also included a set of control conditions that allowed for the dissociation of brain responses related to AV deviance detection from those related to AV integration and other phenomena (Table 1). Following an additive factor logic, we hypothesized that the brain activity distinguishing the McGurk oddball stimuli from the non-McGurk standard stimuli was given by the summation of three types of effects: (a) visual perceptual processes related to the difference between the visual tracks associated with these two types of stimuli (henceforth labeled “visual deviance detection effect”); (b) integration and conflict processing triggered by the presence of contrasting visual and auditory information in the two tracks (henceforth labeled “AV integration/conflict effect”); and (c) sensory-memory and attention capture effects related to the detection of change with respect to the context, occurring when an oddball stimulus was presented (henceforth labeled “AV deviance detection effect”).

Figure 1. 

Examples of the video (A and B) and audio (C and D) tracks of the speech stimuli. (A) Frames sampled from the video track of /ba/. (B) Frames sampled from the video track of a visual catch trial with a short duration mask starting at Frame 21; the duration of the gray color mask covering the mouth could be long (30 frames) or short (10 frames), and the mask could start at Frames 6, 21, or 36. The video tracks were preceded by a pixelated still image of the face to ensure smooth transition between videos. (C) Section of the audio track /ba/ corresponding to Frames 21–41. The end of Frame 25 demarcated the onset of the speech sound and time zero in the following analyses. (D) Section of the audio track of the auditory catch trial. A pure tone was inserted into the audio track from Frames 21 to 30. Similar to the visual mask in a visual catch trial, the duration of the tone could be long (30 frames) or short (10 frames), and the tone could start at Frame 6, 21, or 36.

Figure 1. 

Examples of the video (A and B) and audio (C and D) tracks of the speech stimuli. (A) Frames sampled from the video track of /ba/. (B) Frames sampled from the video track of a visual catch trial with a short duration mask starting at Frame 21; the duration of the gray color mask covering the mouth could be long (30 frames) or short (10 frames), and the mask could start at Frames 6, 21, or 36. The video tracks were preceded by a pixelated still image of the face to ensure smooth transition between videos. (C) Section of the audio track /ba/ corresponding to Frames 21–41. The end of Frame 25 demarcated the onset of the speech sound and time zero in the following analyses. (D) Section of the audio track of the auditory catch trial. A pure tone was inserted into the audio track from Frames 21 to 30. Similar to the visual mask in a visual catch trial, the duration of the tone could be long (30 frames) or short (10 frames), and the tone could start at Frame 6, 21, or 36.

The AV integration/conflict effect and AV deviance detection effect were dissociated by using three block types (Figure 2). Audiovisual blocks (AVBs) included two types of stimuli: standard stimuli, which were audiovisually congruent (and homogenous), and deviant stimuli, which were audiovisually incongruent, with the auditory track equal to the standard stimuli, but a different visual track (eliciting the McGurk effect). Visual blocks (VB) were similar but included only the visual track. The purpose of this type of block was to control for differences in visual perceptual processes (and sequential effects) between the deviant and the standard stimuli, which should occur in both AVB and VB, and AV integration/conflict and deviance detection, which should occur in the AVB only. The third type of block (following Jacobsen & Schröger, 2001; Näätänen & Alho, 1997; Schröger & Wolff, 1996), the audiovisual control blocks (AVCB) also included a “rare” incongruent McGurk stimulus. However, instead of having a set of homogenous or repeating congruent standards included five different types of audiovisually congruent standards (i.e., /pa/, /ta/, /ǰa/ (ja), /ča/ (cha), /ka/). The purpose of this type of block was to distinguish between AV integration effects and deviance detection effects (Table 1).

Figure 2. 

Illustration of the stimulus properties in (A) AVBs, (B) VBs, and (C) AVCBs. In the AVB, 78.95% of trials were standards (A/ba/ V/ba/), 15.79% were deviants (McGurk phonemes A/ba/ V/ga/), and 5.26% were auditory or visual catch trials. The VB were similar to the AVB except for the absence of speech sound. The AVCB consisted of six types of equally probable (15.79%) speech sounds, including the McGurk stimuli. Auditory catch trials were standard or deviant stimuli with a pure tone inserted on the sound track (empty boxes). Visual catch trials were standard or deviant stimuli with a gray mask inserted on the video track (gray boxes). /ča/ and /ǰa/ correspond to the phonetic symbols of cha and ja, respectively.

Figure 2. 

Illustration of the stimulus properties in (A) AVBs, (B) VBs, and (C) AVCBs. In the AVB, 78.95% of trials were standards (A/ba/ V/ba/), 15.79% were deviants (McGurk phonemes A/ba/ V/ga/), and 5.26% were auditory or visual catch trials. The VB were similar to the AVB except for the absence of speech sound. The AVCB consisted of six types of equally probable (15.79%) speech sounds, including the McGurk stimuli. Auditory catch trials were standard or deviant stimuli with a pure tone inserted on the sound track (empty boxes). Visual catch trials were standard or deviant stimuli with a gray mask inserted on the video track (gray boxes). /ča/ and /ǰa/ correspond to the phonetic symbols of cha and ja, respectively.

Specifically, we computed the difference between the brain responses to AV incongruent deviants (McGurk stimuli) and the AV congruent standards (non-McGurk stimuli) in the AVB (Table 1A, [1–2]) and the difference between the deviant and standard (i.e., visual deviance effects of rare vs. frequent) stimuli in the VB (Table 1A, [3–4]). Visual deviance effects were removed by subtracting the difference in the VB from those in the AVB (i.e., Table 1A, [1–2]–[3–4]; labeled hereafter the “combined-effects contrast”). What remains after that subtraction includes AV deviance detection, AV integration, and any interaction between them (see Table 1A).

The next step toward isolating AV integration effects from AV deviance detection relies on crucial properties of the AVCB block. In the AVCB, six types of AV stimuli were presented randomly but equally often, including the same McGurk stimuli presented in the AV block. The probability of each of the six types of stimuli was matched with that of the deviant McGurk stimuli in the AV block. Thus, the McGurk stimuli in the AVCB were identical to those in the AVB and occurred equally often, but their context was different. In the AVB there was a single contrasting standard stimulus that established a pattern of regularity, whereas in the AVCB there were five other kinds of stimuli presented in random order but appearing just as often as the McGurk stimuli (i.e., without regularity). Audiovisual integration or conflict should occur for the McGurk stimuli in both the AVB and AVCB, but in the AVB an additional avMMN response should be evoked because only in that condition are the McGurk stimuli deviant once they are integrated. Thus, a comparison of the responses to the McGurk stimuli in the AVB and AVCB (Table 1B, [1–5]; labeled hereafter the “avMMN contrast”) allows us to isolate a “pure” avMMN effect.

Comparison of the combined-effects contrast and the avMMN contrast allows for a dissociation of the brain responses related to AV integration, AV deviance detection, and their possible interaction. The results of the two contrasts have deviance detection in common, whereas only the combined-effects contrast still includes AV integration effects. More interestingly, a result showing that the same piece of cortex is activated for both contrasts, but with different amplitudes and/or latency, could be taken to indicate the occurrence of an interaction between the AV integration and AV deviance detection processes. The additive factor logic requires that the dependent variables used (i.e., ERP and EROS measures) be linearly related to the constructs they are supposed to measure (i.e., postsynaptic neural activity) and is typically considered valid for both ERPs and EROS.

Catch trials including tone and visual masks overlapping the audio and visual tracks, respectively, were also included in all three types of blocks. Participants were instructed to respond to the duration of the masks by a button press. Different from the tasks employed in other studies (e.g., counting the total number of auditory stimuli), this task required attention to both the auditory and visual channels but did not require participants to actively process the speech information, so that the avMMN could be measured without contamination from response conflicts.

Stimuli and Procedure

Videos of a male native English speaker pronouncing a syllable were presented to the participants. Videos with the speaker pronouncing the syllables /ba/, /ga/, /pa/, /ta/, /ǰa/ (ja), /ča/ (cha), /ka/, were recorded. The videos were sampled at a rate of 29.97 frames (440 × 680 pixels) per second with a total of 69 frames (i.e., 2.2977 sec). The videos were aligned so that in all of them the onset of the sound occurred at Frame 25 (i.e., 832.5 msec after the onset of the video). The audio and video tracks of three sample videos of each syllable were recombined to produce nine variations of each syllable. The audio tracks of /ba/ and video tracks of /ga/ were recombined to produce nine variations of the McGurk /da/ syllable (Figure 1A, C). The video tracks of the three versions of /ga/ and /ba/ started at Frames 8, 10, 11 and Frames 20, 21, 23, respectively. In other words, the auditory information (i.e., change in the sound wave) lagged the visual information (i.e., mouth movement) by 2–12 frames. It is normal for visual information to precede the related auditory information in AV speech stimuli (Jessen & Kotz, 2013), as the lips and vocal tract are required to assume a particular shape in order for the air flowing through the tract to produce a specific sound.

There were two types of catch trial conditions (Figure 1B, D). In the visual catch trial condition, a gray-colored pixelated noise mask briefly covering the mouth area of the speaker was inserted in one of the regular video tracks. In the auditory catch trial condition, a brief tone of the same frequency as that of the fundamental of the male speaker's voice was inserted on one of the regular audio tracks. The first and last 10% of the tone duration were the intensity rise and fall periods of the tone, and the maximum intensity of the tone was the same as the average intensity across syllables. The duration of the mask or tone was either 10 frames (i.e., 333 msec) or 30 frames (i.e., 999 msec). The participants' task was to judge the duration of the visual or tone masks and make a button press choice response accordingly (long or short). To make sure that participants attended to both the audio and visual tracks throughout the experiment, the mask or tone could appear on the 6th, 21st, or 36th frame of the video (i.e., 166.5 msec, 666 msec, or 1165.5 msec from onset of the video). The catch trials and the trial immediately following were not included in the analysis. Participants were not required to make explicit responses to indicate occurrence of the McGurk illusion. Although the McGurk illusion may not be elicited on every trial, this variability will only reduce the experimental effects rather than creating false positives.

In each block, 9 of 57 trials (15.79%) were deviants, 3 were catch trials (5.26%), and the remaining were standards. The ratio of standard to deviant catch trials was maintained in each block, and 50% of the catch trials were visual and 50% were auditory. A pixelated still image of the male speaker was presented between trials for about 200 msec, so that the SOA of the videos was 2500 msec. A total of 42 blocks, 14 of each type, were presented to each participant. Half of the participants were presented with the following block order: AVB, VB, AVCB, whereas the other half were presented with the reverse block order for counterbalancing purposes.

EROS Recording and Analysis

A frequency-domain oximeter (Imagent, ISS, Inc., Champaign, IL) was used to produce frequency-modulated (110 MHz) NIR light (830 nm) from laser diodes; the light was channeled to the participant's scalp via 40 plastic-clad silica optical fibers (400-μm-diameter core). Twenty-four fiber-optic detector bundles (3 mm diameter) were placed on the participant's scalp, and light from the source fibers that passed through the participant's scalp, skull, and brain to reach these detectors was carried to photo multiplier tubes within the Imagent. The photo multiplier tubes were modulated at 110.003125 MHz, thereby generating a 3.125-kHz heterodyning frequency (i.e., cross-correlation frequency). The A–D sampling rate was 50 kHz, the output current was fast Fourier transformed, and DC (average) intensity, AC (amplitude) intensity, and relative phase delay measures were computed every 1.6 msec. Only phase delay data are reported here, because a previous study (Gratton et al., 2006) demonstrated better sensitivity for measuring EROS with phase delay measure.

A custom-built head mount system held the light source and detector fibers in position on the participant's head. A single montage was used to interrogate the left STG, the left inferior frontal gyrus (IFG), and the occipital cortex (in both hemispheres) at the same time (Figure 3).2 The montage comprised 24 detectors and 40 sources. Each detector could receive light from only 16 sources, which were time-multiplexed, yielding a total of 384 source-detector pairs (24 detectors × 16 light sources) and an effective sampling rate of 25.6 msec (approximately 39.1 Hz). This montage configuration allows for high spatial resolution recording at the expense of area coverage (i.e., covering the right and left occipital, left inferior parietal, left temporal and left inferior frontal cortex, but not the entire head).

Figure 3. 

An example of the EROS recording montage overlaid on the structural MRI of a representative participant. The positions of the light sources (white circles) and detectors (dark gray circles), covering the frontal, temporal, and occipital cortices, were projected on the left lateral (left) and posterior coronal (right) surface of the structural MRI.

Figure 3. 

An example of the EROS recording montage overlaid on the structural MRI of a representative participant. The positions of the light sources (white circles) and detectors (dark gray circles), covering the frontal, temporal, and occipital cortices, were projected on the left lateral (left) and posterior coronal (right) surface of the structural MRI.

The functional optical data were coregistered with individual structural MRI. T1-weighted 3-D anatomical MRIs were obtained for each participant using a Siemens Trio 3-T scanner. The nasion and preauricular points were marked with Beekley Spots (Beekley Corporation, Bristol, CT) in each MRI scan. The same fiducial points, as well as 150 points scattered around the scalp and eye socket regions, were digitized in 3-D space (Polhemus Fastrak 3Space, Colchester, VT) and used for coregistration with the MR anatomical data (see Whalen, Maclin, Fabiani, & Gratton, 2008). The locations of the recording points were transformed into Talairach space (Talairach & Tournoux, 1988). This location information was used for the reconstruction of the expected light path for each channel and participant in a common Talairach space (see details below; see also Gratton & Fabiani, 2009, and Figure 1 in Tse, Gordon, Fabiani, & Gratton, 2010 for detailed descriptions of the preprocessing, coregistration, and analysis procedures).

The optical data were corrected for phase wrapping, transformed into picoseconds, normalized by subtracting the phase mean, pulse-corrected (Gratton & Corballis, 1995), filtered with a 1–10 Hz band-pass filter, and then segmented (using epochs 2000 msec long, including a 204.8 msec/8 data points prestimulus baseline interval) and averaged for each channel, time point, and condition. The influence of noisy channels was reduced by eliminating from the analysis channels that had standard deviations of the phase greater than 160 psec and/or a source detector distance shorter than 15 mm or greater than 75 mm (Gratton et al., 2006).

The 3-D reconstruction and statistical analysis of the optical data were carried out using Opt-3D (see Gratton & Fabiani, 2009; Gratton, 2000). Specifically, for each participant, the optical signal for a given voxel was defined as the mean value of the channels that overlapped at that particular voxel (Wolf et al., 2000); t statistics were calculated at the group level for each voxel for the phase data and converted into Z scores. An 8-mm spatial filter was applied to the EROS data. Statistical maps of the optical signal for each data point (i.e., a time window of 25.6 msec) were generated with corrections for multiple comparisons using the random field theory approach (Friston et al., 1994). The EROS data were back-projected onto the lateral and posterior views of the brain and averaged across the x axis (left–right). As we were interested in the left hemisphere and the visual cortex, only data from the middle sagittal plane to the surface of the left hemisphere and data from the posterior coronal plane were included in the analysis. Because the statistical map was surface-projected onto a template brain, the Talairach coordinates reported here comprise only the y (anterior–posterior) and z (dorsal–ventral) values for the sagittal plane and the x (left–right) and z values for the coronal plane.

To compare the spatiotemporal dynamics involved in AV deviance detection and AV integration, the EROS analysis focused on the contrasts described in Table 1. Three ROIs, frontal, temporal, and occipital, were used for statistical analyses and were constructed based on previous studies on deviance detection and AV integration (Tse & Penney, 2007, 2008; Saint-Amour et al., 2007; Skipper et al., 2005, 2007; Tse et al., 2006; Miller & D'Esposito, 2005; Callan et al., 2003; Möttönen et al., 2002; Opitz, Rinne, Mecklinger, Von Cramon, & Schröger, 2002; Rinne et al., 1999). Previous ERP and MEG studies (Saint-Amour et al., 2007; Colin et al., 2002, 2004; Sams et al., 1991) showed that AV integration and deviance detection take place between 150 and 400 msec postdeviance onset. Therefore, the interval of interest analysis (IOI) focused on this time window.

ERP Recording and Analysis

EEG was recorded simultaneously for comparison with previous studies on multisensory integration. In addition, as described above, we predicted that the ERP response of the combined-effects contrast is more substantial than that of the AV integration effects.

The EEG was recorded with gold electrodes at seven scalp locations based on the 10/20 system (Fz, Cz, Pz, C3, C4, left and right mastoid) with a reference electrode placed on the nose tip. Four electrodes, one above and one below the right eye and two at the outer canthus of each eye, were used for bipolar vertical and horizontal EOG recording. The EEG was filtered online using a 0.01–30 Hz band pass and sampled at 200 Hz. Electrode impedance was kept below 5 kΩ. The EEG and EOG were filtered offline with a 0.1–20 Hz band pass. The EEG waveforms were segmented using 2500-msec epochs time-locked to the onset of the speech burst, including a 200-msec baseline before the burst, and averaged according to stimulus condition. Ocular artifacts were corrected (Gratton, Coles, & Donchin, 1983) and epochs (less than 5% of total trials) containing other EEG artifacts (i.e., blocking or a range exceeding 200 μV) were removed from the analysis.

To compare the times of activation for the combined-effects and avMMN contrasts, we first identified (using running t tests) the time intervals with the strongest activation within the IOI (i.e., 150–400 msec) as revealed by significant differences between the pairs of waveforms used in each contrast. We then computed the mean amplitudes across the time intervals exhibiting significant differences for each contrast. These mean values were entered in repeated-measure ANOVAs to identify sustained differences across the contrasts. Following previous MMN research, the ERP analyses focused on the Fz electrode, where the effects of stimulus deviance are typically largest. Similarly, the ERP response observed at the Fz electrode and at the mastoid electrodes (all referenced to the nosetip) was examined for an inversion of the polarity of the response, considered to be a typical property of the MMN and reflecting its origin in the superior bank of the temporal lobe. We also examined the ERPs obtained at the Pz electrode for possible differences between the visual deviants and standards (contrast [3–4]); however, no significance was observed at that electrode location.

RESULTS

Behavioral Results

Behavioral results showed that all participants responded correctly to at least 88% of the catch trials (mean accuracy = 94%, SD = 3.4%, range = 88–100% correct) indicating that they attended to both the visual and auditory channels throughout the experiment.

EROS Results

Statistical maps of the EROS results are shown in Figure 4. The peak Z scores, critical Z values, and locations of the significant peak optical responses for the combined-effects and avMMN contrasts are summarized in Table 2. In the combined-effects contrast, significant increases in optical signals were found in the IFG from 179 to 230 msec, the middle temporal gyrus (MTG) from 332 to 383 msec, and the STG from 383 to 409 msec. In the avMMN contrast, similar to the combined-effects contrast, increases in optical signals were found in the MTG from 332 to 358 msec. However, different from the combined-effects contrast, an early optical response in the STG from 204 to 230 msec was found, and there was no significant optical response in the frontal ROI. No significant optical responses were observed in the visual control contrast within the IOI.

Figure 4. 

Statistical maps of the EROS results overlaid on the left lateral and posterior coronal views of a template brain. The left, middle, and right columns present results from the combined-effects, avMMN, and visual control contrasts, respectively. The top three rows show results in the time window from 153 to 230 msec, and the bottom three rows show results from 332 to 409 msec. Green boxes indicate the locations of the frontal, temporal, and occipital ROIs. The white cross within each ROI indicates the location of peak activity. The green boxes and white crosses are only shown in the time windows and contrasts with significant peak responses. The dark gray shading shows the area of cortex interrogated by the recording montage.

Figure 4. 

Statistical maps of the EROS results overlaid on the left lateral and posterior coronal views of a template brain. The left, middle, and right columns present results from the combined-effects, avMMN, and visual control contrasts, respectively. The top three rows show results in the time window from 153 to 230 msec, and the bottom three rows show results from 332 to 409 msec. Green boxes indicate the locations of the frontal, temporal, and occipital ROIs. The white cross within each ROI indicates the location of peak activity. The green boxes and white crosses are only shown in the time windows and contrasts with significant peak responses. The dark gray shading shows the area of cortex interrogated by the recording montage.

Table 2. 

Statistically Significant Peak EROS Responses

Time (msec)AV Deviant versus AV Standard Interaction ContrastAV Deviant versus AV Control Contrast
ROICoordinatePeak Z (Critical Z)LocationBAROICoordinatePeak Z (Critical Z)LocationBA
179–204 Frontal 38, −1 3.16 (2.40) IFG 45,46 – – – – – 
204–230 Frontal 38, −1 2.87 (2.60) IFG 45,46 Temporal −51, 14 2.70 (2.50) STG 22 
332–358 Temporal −33, −8 2.94 (2.43) MTG 21 Temporal −26, −3 2.50 (2.43) MTG 21 
Occipital −13, 4 2.84 (2.59) Cuneus 17,18      
358–383 Temporal −33, −8 2.96 (2.37) MTG 21 – – – – – 
Occipital 2, 9 2.83 (2.70) Cuneus 17,18      
383–409 Temporal −38, 17 2.56 (2.30) STG 22 – – – – – 
Time (msec)AV Deviant versus AV Standard Interaction ContrastAV Deviant versus AV Control Contrast
ROICoordinatePeak Z (Critical Z)LocationBAROICoordinatePeak Z (Critical Z)LocationBA
179–204 Frontal 38, −1 3.16 (2.40) IFG 45,46 – – – – – 
204–230 Frontal 38, −1 2.87 (2.60) IFG 45,46 Temporal −51, 14 2.70 (2.50) STG 22 
332–358 Temporal −33, −8 2.94 (2.43) MTG 21 Temporal −26, −3 2.50 (2.43) MTG 21 
Occipital −13, 4 2.84 (2.59) Cuneus 17,18      
358–383 Temporal −33, −8 2.96 (2.37) MTG 21 – – – – – 
Occipital 2, 9 2.83 (2.70) Cuneus 17,18      
383–409 Temporal −38, 17 2.56 (2.30) STG 22 – – – – – 

Brodmann's areas and corresponding locations of the brain are obtained from the Talairach Daemon, which shows the nearest gray matter to the peak EROS response. BA = Brodmann's area.

On the basis of the locations and latencies shown in Table 2, the optical responses were grouped into four clusters: IFG (179–230 msec), STG (204–230 and 383–409 msec), MTG (332–383 msec), and OCC (332–383 msec). The peak amplitude of each cluster and its corresponding standard error for the combined-effects and avMMN contrasts and their component conditions are shown in Figure 5. A repeated-measure ANOVA on the peak optical response with contrast and cluster as factors showed nonsignificant main effects of contrast (F(1, 12) = 2.556, p > .05) and cluster (F(4, 48) = 0.742, p > .05 with Greenhouse-Geisser correction, ε = .481), but a significant interaction effect (F(4, 48) = 4.126, p < .05 with Greenhouse-Geisser correction, ε = .527), suggesting differences in the spatiotemporal dynamics of the brain responses between the two contrasts.

Figure 5. 

(A) Peak amplitude of the EROS response averaged across participants for the four clusters of optical responses identified in Figure 4 in the combined-effects and avMMN contrasts. Vertical dashed lines separate the different clusters of optical responses. Error bars indicate the SEM computed across participants. (B) Peak amplitude of the EROS responses for each condition (unsubtracted).

Figure 5. 

(A) Peak amplitude of the EROS response averaged across participants for the four clusters of optical responses identified in Figure 4 in the combined-effects and avMMN contrasts. Vertical dashed lines separate the different clusters of optical responses. Error bars indicate the SEM computed across participants. (B) Peak amplitude of the EROS responses for each condition (unsubtracted).

Paired t tests (one tailed) were conducted to investigate if the optical responses in the combined-effects contrast were larger than those in the avMMN contrast for the IFG (179–230), MTG (332–383), and OCC (332–383) clusters. A significantly larger optical response was found in the IFG cluster (179–230; t(15) = 1.799, p < .05) and a marginally larger optical response was observed in the OCC cluster (332–383; t(14) = 1.709, p = .054) for the combined-effects contrast compared to the avMMN contrast. However, no significant difference in the optical responses between the contrasts was revealed in the MTG cluster (332–383; t(13) = −.103, p > .05).

Because of the similarity of the locations of the STG clusters (204–230 and 383–409), a repeated-measure ANOVA with Time of activation and Contrast as factors was conducted to investigate the presence of a possible delay in the STG response. The main effects of Time of activation, F(1, 15) = 0.866, p > .05, and Contrast, F(1, 15) = 1.801, p > .05, were not significant but the interaction effect was, F(1, 15) = 6.171, p < .05. Paired sample t tests showed a larger optical response in the avMMN contrast from 204 to 230 msec, t(15) = −2.292, p < .05, and a larger optical response in the combined-effects contrast from 383 to 409 msec, t(15) = 1.887, p < .05. This result suggests the existence of a delay in the STG response in the combined-effects contrast.

Further analysis suggests that the IFG activity in the combined-effects contrast is due to a decrease in the frontal response to the AV standards (Figure 6, top), peak Z scores = −2.63, critical Z = −2.55, y = 38, z = −1. This indicates suppression or habituation in the AV integration process with repeated AV speech stimuli. A similar effect was found in the STG from 383 to 409 msec in the congruent AV standards (Figure 6, bottom), peak Z scores = −3.24, critical Z = −2.47, y = −41, z = 17.

Figure 6. 

Statistical maps of the EROS results for the AV standards between 179–204 msec (top) and 383–409 msec (bottom).

Figure 6. 

Statistical maps of the EROS results for the AV standards between 179–204 msec (top) and 383–409 msec (bottom).

ERP Results

Figure 7 shows the averaged ERP waveforms of the combined-effects contrast and the avMMN contrast; the visual control contrast is also included for reference. Running t tests (against baseline) revealed a significant increase in negativity from 270 to 375 msec in the waveform of the combined-effects contrast and an earlier negativity from 230 to 265 msec in the waveform of the avMMN contrast. On the basis of these results, mean amplitudes of the ERP response across an early (i.e., 230–265 msec) and late (i.e., 270–375 msec) time windows were calculated for each of the contrasts for Fz and left and right mastoid electrodes. The mean amplitudes and standard errors of the means are shown in Figure 8. One-sample t tests against baseline showed significant negativities in both the early (mean = −1.242 μV, SD = 2.267, t(15) = −2.191, p < .05) and late time windows (mean = −1.849 μV, SD = 1.582, t(15) = −4.675, p < .05) in the combined-effects contrast. In the avMMN contrast, a significant negativity was found in the early time window (mean = −0.793 μV, SD = 1.047, t(15) = −3.027, p < .05) but not in the late time window (mean = −0.545 μV, SD = 1.285, t(15) = −1.696, p > .05).

Figure 7. 

ERP waveforms averaged across participants for the combined-effects contrast (top) and avMMN contrast (middle) for the left mastoid (left column), Fz (middle column), and right mastoid (right column) electrodes. The “AV Diff” waveform represents the difference waveform obtained by subtracting the AV standard waveform from the AV deviant waveform in the AVB, and the “V diff” waveform represents the difference waveform obtained by subtracting the V standard waveform from the V deviant waveform in the VB. The light gray area indicates the time window showing the most significant response in the combined-effects contrast. The dark gray area indicates the time window showing the most significant response in the avMMN contrast. The mean amplitude in the dark gray time window was also significant for the combined-effects contrast. ERP waveforms for the visual control contrast of the Fz, Cz, and Pz electrodes are shown in the bottom panel.

Figure 7. 

ERP waveforms averaged across participants for the combined-effects contrast (top) and avMMN contrast (middle) for the left mastoid (left column), Fz (middle column), and right mastoid (right column) electrodes. The “AV Diff” waveform represents the difference waveform obtained by subtracting the AV standard waveform from the AV deviant waveform in the AVB, and the “V diff” waveform represents the difference waveform obtained by subtracting the V standard waveform from the V deviant waveform in the VB. The light gray area indicates the time window showing the most significant response in the combined-effects contrast. The dark gray area indicates the time window showing the most significant response in the avMMN contrast. The mean amplitude in the dark gray time window was also significant for the combined-effects contrast. ERP waveforms for the visual control contrast of the Fz, Cz, and Pz electrodes are shown in the bottom panel.

Figure 8. 

ERP amplitudes averaged across participants for the early (230–265 msec) and late (270–375 msec) time windows reported in Figure 7 for the combined-effects and avMMN contrasts. Error bars indicate the SEM computed across participants.

Figure 8. 

ERP amplitudes averaged across participants for the early (230–265 msec) and late (270–375 msec) time windows reported in Figure 7 for the combined-effects and avMMN contrasts. Error bars indicate the SEM computed across participants.

A repeated-measure ANOVA with Contrast and Time window as factors showed no significant main effects of either Contrast, F(1, 15) = 2.403, p > .05, or Time window, F(1, 15) = 0.447, p > .05, but a significant interaction effect, F(1, 15) = 4.630, p < .05. Analyses of simple effects with paired-sample t tests suggested a significant difference in the mean amplitude between the contrasts, t(15) = −2.384, p < .05, in the late time window, indicating a larger negativity in the combined-effects contrast. However, no significant difference between the contrasts was found in the early time window, t(15) = −0.694, p > .05. These results suggest the presence of a more sustained and larger negativity in the combined-effects contrast compared to the avMMN contrast. This is consistent with the EROS results, which also show a more extensive response for the combined-effects contrast than for the avMMN contrast.

DISCUSSION

In this study, we examined when and where the brain processes associated with AV integration occurred, using McGurk stimuli as a test case. We assumed that AV integration should involve interactions between unimodal and multimodal areas. Specifically, relevant unimodal sensory areas included both auditory (STG and MTG) and visual (occipital) regions. Among the multimodal association areas, we focused on the IFG because it receives inputs from both the auditory and visual modalities, and it is involved in speech processing. Our prediction was that processes underlying AV integration within the IFG should be initiated very early during processing, within the first 200 msec from stimulation, to account for the fact that AV integration influences the formation of phoneme percepts in the McGurk effect. Furthermore, we predicted that AV integration should occur before unimodal processing is completed.

To isolate the effects of AV integration from other processes that are likely to occur when presenting multimodal stimuli, (a) we employed a passive oddball paradigm that kept constant several factors (such as attention and response demands) that might influence brain activity in a multimodal experiment and (b) we included a set of control conditions (designed as statistical contrasts according to an additive factor logic3) aimed at dissociating AV integration from AV deviance detection, similar to previous ERP studies (e.g., Jacobsen & Schröger, 2001; Näätänen & Alho, 1997; Schröger & Wolff, 1996). In addition, because our predictions required fine-grained temporal and spatial discriminations, we used a technique combining spatial and temporal resolution, EROS, which can provide independent measures of the time course of activity in different cortical regions. The critical comparison was between the time course of the activities related to the “combined-effects” contrast (which reflects both the AV integration and deviance detection processes) and the “avMMN” contrast (which reflects only the deviance detection process). A comparison between these two time courses allowed us to determine if, and when, each of these processes occurred in any particular brain region.

The results supported our predictions: Activity uniquely associated with AV integration occurs in IFG between 179 and 230 msec and in occipital cortex between 332 and 383 msec. This activity precedes that associated with AV deviance detection, which occurs in the MTG between 332–383 msec. The spatial and temporal extension of the MTG activity suggested a possible interaction between the AV integration and AV deviance detection processes in the MTG. An interaction effect was found also in the STG: the latency of the STG activity differed between the contrasts, occurring earlier for the avMMN contrast (204–230 msec) than for the combined-effects contrast (383–409 msec).

Both the EROS and the simultaneously recorded ERP data showed more sustained brain activities in the combined-effects contrast than in the avMMN contrast, which is consistent with the prediction that both AV integration and deviance detection processes are present in the combined-effects contrast, whereas only AV deviance detection is involved in the avMMN contrast. In addition, activation patterns were similar in the EROS responses measured in temporal regions and in the ERP responses measured at the Fz electrode. Consistence between the EROS and ERP results has been commonly found in a number of combined EROS–ERP studies (e.g., Tse et al., 2006, 2013; Tse & Penney, 2007, 2008), and it is useful for cross-validating the EROS results.

Previous ERP and fMRI studies (e.g., Saint-Amour et al., 2007; Skipper et al., 2007) suggested a similar frontotemporal activation pattern in AV speech perception. The EROS results reported here support these findings and show a frontotemporal/occipital activation pattern in AV integration. In addition, the current study offered two advantages. First, the passive oddball paradigm allowed for disambiguation of the AV integration from the AV deviance detection process. Second, EROS revealed spatiotemporal dynamics without having to infer them from the slower hemodynamic signals. The sequence of activations (prefrontal areas followed by temporal and occipital cortex activities) suggests top–down control of the primary or association sensory areas by the pFC or a feedback mechanism from the prefrontal to the sensory cortex.

The pFC involvement in AV integration observed in fMRI studies (Skipper et al., 2005, 2007; Miller & D'Esposito, 2005; Callan et al., 2003) is thought to index difficulty in AV integration. For example, IFG activity was found to increase with difficulty in forming a fused percept of AV incongruent speech (Skipper et al., 2007; Miller & D'Esposito, 2005). However, this frontal activity is suppressed if one can easily identify the syllable on the basis of the auditory channel alone (Skipper et al., 2007). Our results also suggest that the IFG differences observed in the combined-effects contrast are due in part to a decrease of the frontal response to the AV standards, indicating suppression or habituation of the AV integration process with repeated AV speech stimuli. In addition, a similar suppression or habituation effect was found in the STG from 383 to 409 msec in the congruent AV standards, consistent with results from ERP studies (van Wassenhove, Grant, & Poeppel, 2005; Besle, Fort, Delpuech, & Giard, 2004).

Although IFG activation is commonly found in speech perception, its functional role remains controversial. Skipper et al. (2007) proposed that the IFG, as part of the speech production system, is required for speech perception. Specifically, information generated from the speech production system may be used to constrain the percept in AV speech. Further support for this theory was provided by the observation of frontal and primary motor cortex activities in response to speech sounds (see Fadiga, Craighero, & Olivier, 2005, for a review). However, other studies (e.g., Sundara, Namasivayam, & Chen, 2001) found frontal and motor cortex activation when observing speech movements, but not when listening to speech sounds. Another possible functional role of the IFG activation is that it may reflect response selection as suggested by a TMS study (Tremblay & Gracco, 2009). The left IFG may be recruited for top–down control to resolve incongruent linguistic stimuli (January, Trueswell, & Thompson-Schill, 2009). Specifically, the IFG may be responsible for selecting among competing representations (selection or interference resolution; Nelson, Reuter-Lorenz, Persson, Sylvester, & Jonides, 2009). Although the frontal-followed-by-sensory cortex activation pattern is consistent with the predictions formulated by Skipper and colleagues (2007), the experimental design of the current study does not allow us to make conclusive statements about the functional role of the frontal cortex in AV integration. In addition, because of limitations in the number of optical recording channels available, we focused on the left frontal and temporal regions (whose involvement was predicted a priori), rather than on the entire brain. Further studies are needed to investigate the involvement of the right frontal and temporal cortices in AV integration and deviance detection.

Compared with previous optical aMMN studies (Tse et al., 2006, 2013; Tse & Penney, 2007), the STG and MTG optical responses observed in the current study are relatively posterior and inferior to the positions of the optical aMMN recorded in the right hemisphere. Such results suggest that higher-level processing in the secondary auditory cortex or association areas is needed for AV deviance detection. This is consistent with the view proposed by Fuster and colleagues (Fuster, 2001, 2006; Fuster, Bodner, & Kroger, 2000) in which sensory information transfers from primary sensory cortex to association areas to form complex representations by integrating basic sensory percepts. In other words, association areas take input from multiple sensory areas to form multisensory percepts.

The results of this study show interactions between occipital, prefrontal, and temporal cortex during AV speech perception. Interestingly, animal studies using anterograde tracers demonstrate direct connections between primary visual cortex and auditory association cortex (Rockland & Ojima, 2003) and projections from the frontal cortex to auditory association cortex (Kaas & Hackett, 2000; Hackett, Stepniewska, & Kaas, 1999).

In summary, the current study showed that AV integration and AV deviance detection can be dissociated with proper control conditions. The AV integration process (in IFG) was found to take place earlier than the AV deviance detection process in temporal cortex. A frontal-temporal/occipital activation pattern, suggesting a top–down process or feedback system, was observed in AV integration. AV integration and AV deviance detection appeared to interact with each other, leading to modulation in the amplitude or latency of brain responses. Most importantly, our data do demonstrate that multimodal integration occurs very early during the processing of bimodal stimuli, at a stage at which phoneme perception is still incomplete. This accounts for the fact the visual information is capable of influencing the auditory percept (as exemplified by the McGurk effect). Multimodal integration appears to rely on the dynamic interaction between multimodal (the IFG) and modality-specific areas (MTG and STG for the auditory modality and occipital cortex for the visual modality), which are activated in rapid alternation during the first few hundred milliseconds following stimulus presentation. This process may play an important role in generating the conscious experience of a multimodal stimulus.

Acknowledgments

This work was supported by seed funding from the Beckman Institute and Carle Research Foundation. We thank Gary Dell and Greg Miller for reading earlier drafts of this manuscript and Jaimie Gilbert for providing the video stimuli. This work was performed in partial fulfillment of the PhD degree at the University of Illinois by the first author.

Reprint requests should be sent to Chun-Yu Tse, Department of Psychology and Center for Cognition and Brain Studies, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, or via e-mail: cytse@psy.cuhk.edu.hk.

Notes

1. 

The term “deviant” is used here to refer to a stimulus that occurs less often than another “standard” stimulus. It does not refer to the mismatch between visual and auditory cues in the McGurk stimuli, which is instead referred to as “incongruence.” In most of the studies reviewed here as well as in the current study, incongruent McGurk stimuli were in fact also deviant. Thus, the term “deviance detection” here does not mean detecting the mismatch between the visual and auditory components of the McGurk stimuli, but rather the detection of an infrequent stimulus.

2. 

On the basis of prior studies, we predicted left frontal and temporal, as well as bilateral occipital, activity. The optical recording equipment trades off between temporal resolution, spatial resolution, and the extent of the brain area from which the optical data are obtained. Therefore, the choice of recording montage needs to be made a priori.

3. 

The additive factor model (Sternberg, 1969) makes two major assumptions: (a) a new process can be added in response to particular task demands without necessarily altering existing ones (postulate of pure insertion) and (b) a particular manipulation may influence only one particular process and not others (postulate of selective influence). Within the context of cognitive neuroimaging, a typical example of application of this logic is the use of subtraction methods in GLM to test differences across conditions. Common to all experiments involving control conditions, the validity of these assumptions depends on the application of theoretically and ecologically valid experimental and control conditions.

REFERENCES

REFERENCES
Beauchamp
,
M. S.
,
Nath
,
A. R.
, &
Pasalar
,
S.
(
2010
).
fMRI-guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect
.
Journal of Neuroscience
,
30
,
2414
2417
.
Besle
,
J.
,
Bertrand
,
O.
, &
Giard
,
M. H.
(
2009
).
Electrophysiological (EEG, sEEG, MEG) evidence for multiple audiovisual interactions in the human auditory cortex
.
Hearing Research
,
258
,
143
151
.
Besle
,
J.
,
Fort
,
A.
,
Delpuech
,
C.
, &
Giard
,
M. H.
(
2004
).
Bimodal speech: Early suppressive visual effects in human auditory cortex
.
European Journal of Neuroscience
,
20
,
2225
2234
.
Buchan
,
J. N.
, &
Munhall
,
K. G.
(
2012
).
The effect of a concurrent working memory task and temporal offsets on the integration of auditory and visual speech information
.
Seeing and Perceiving
,
25
,
87
106
.
Butler
,
J. S.
,
Foxe
,
J. J.
,
Fiebelkorn
,
I. C.
,
Mercier
,
M. R.
, &
Molholm
,
S.
(
2012
).
Multisensory representation of frequency across audition and touch: High density electrical mapping reveals early sensory-perceptual coupling
.
Journal of Neuroscience
,
32
,
15338
15344
.
Butler
,
J. S.
,
Molholm
,
S.
,
Fiebelkorn
,
I. C.
,
Mercier
,
M. R.
,
Schwartz
,
T. H.
, &
Foxe
,
J. J.
(
2011
).
Common or redundant neural circuits for duration processing across audition and touch
.
Journal of Neuroscience
,
31
,
3400
3406
.
Callan
,
D. E.
,
Jones
,
J. A.
,
Munhall
,
K.
,
Callan
,
A. M.
,
Kroos
,
C.
, &
Vatikiotis-Bateson
,
E.
(
2003
).
Neural processes underlying perceptual enhancement by visual speech gestures
.
NeuroReport
,
14
,
2213
2218
.
Calvert
,
G. A.
(
2001
).
Crossmodal processing in the human brain: Insights from functional neuroimaging studies
.
Cerebral Cortex
,
11
,
1110
1123
.
Calvert
,
G. A.
,
Bullmore
,
E. T.
,
Brammer
,
M. J.
,
Campbell
,
R.
,
Williams
,
S. C.
,
McGuire
,
P. K.
, et al
(
1997
).
Activation of auditory cortex during silent lipreading
.
Science
,
276
,
593
596
.
Chiarelli
,
A. M.
,
Di Vacri
,
A.
,
Romani
,
G. L.
, &
Merla
,
A.
(
2013
).
Fast optical signal in visual cortex: Improving detection by General Linear Convolution Model
.
Neuroimage
,
66
,
194
202
.
Colin
,
C.
,
Radeau
,
M.
,
Soquet
,
A.
, &
Deltenre
,
P.
(
2004
).
Generalization of the generation of an MMN by illusory McGurk percepts: Voiceless consonants
.
Clinical Neurophysiology
,
115
,
1989
2000
.
Colin
,
C.
,
Radeau
,
M.
,
Soquet
,
A.
,
Demolin
,
D.
,
Colin
,
F.
, &
Deltenre
,
P.
(
2002
).
Mismatch negativity evoked by the McGurk–MacDonald effect: A phonetic representation within short-term memory
.
Clinical Neurophysiology
,
113
,
495
506
.
Czigler
,
I.
(
2007
).
Visual mismatch negativity: Violation of nonattended environmental regularities
.
Journal of Psychophysiology
,
21
,
224
.
Fabiani
,
M.
,
Gratton
,
G.
, &
Federmeier
,
K.
(
2007
).
Event related brain potentials
. In
J.
Cacioppo
,
L.
Tassinary
, &
G.
Berntson
(Eds.),
Handbook of psychophysiology
(3rd ed., pp.
85
119
).
Cambridge, UK
:
Cambridge University Press
.
Fabiani
,
M.
,
Low
,
K. A.
,
Wee
,
E.
,
Sable
,
J. J.
, &
Gratton
,
G.
(
2006
).
Reduced suppression or labile memory? Mechanisms of inefficient filtering of irrelevant information in older adults
.
Journal of Cognitive Neuroscience
,
18
,
637
650
.
Fadiga
,
L.
,
Craighero
,
L.
, &
Olivier
,
E.
(
2005
).
Human motor cortex excitability during the perception of others' action
.
Current Opinion in Neurobiology
,
15
,
213
218
.
Friston
,
K. J.
,
Holmes
,
A. P.
,
Worsley
,
K. J.
,
Poline
,
J. P.
,
Frith
,
C. D.
, &
Frackowiak
,
R. S.
(
1994
).
Statistical parametric maps in functional imaging: A general linear approach
.
Human Brain Mapping
,
2
,
189
210
.
Fuster
,
J. M.
(
2001
).
The prefrontal cortex—An update: Time is of the essence
.
Neuron
,
30
,
319
333
.
Fuster
,
J. M.
(
2006
).
The cognit: A network model of cortical representation
.
International Journal of Psychophysiology
,
60
,
125
132
.
Fuster
,
J. M.
,
Bodner
,
M.
, &
Kroger
,
J. K.
(
2000
).
Cross-modal and cross-temporal association in neurons of frontal cortex
.
Nature
,
405
,
347
351
.
Gilbert
,
J. L.
,
Lansing
,
C. R.
, &
Garnsey
,
S. M.
(
2012
).
Seeing facial motion affects auditory processing in noise
.
Attention, Perception, & Psychophysics
,
74
,
1761
1781
.
Gratton
,
G.
2000
.
“Opt-cont” and “opt-3D”: A software suite for the analysis and 3D reconstruction of the event-related optical signal (EROS)
.
Psychophysiology
,
37
,
S44
.
Gratton
,
G.
,
Brumback
,
C. R.
,
Gordon
,
B. A.
,
Pearson
,
M. A.
,
Low
,
K. A.
, &
Fabiani
,
M.
(
2006
).
Effects of measurement method, wavelength, and source-detector distance on the fast optical signal
.
Neuroimage
,
32
,
1576
1590
.
Gratton
,
G.
,
Coles
,
M. G.
, &
Donchin
,
E.
(
1983
).
A new method for off-line removal of ocular artifact
.
Electroencephalography and Clinical Neurophysiology
,
55
,
468
484
.
Gratton
,
G.
, &
Corballis
,
P. M.
(
1995
).
Removing the heart from the brain: Compensation for the pulse artifact in the photon migration signal
.
Psychophysiology
,
32
,
292
299
.
Gratton
,
G.
,
Corballis
,
P. M.
,
Cho
,
E.
,
Fabiani
,
M.
, &
Hood
,
D. C.
(
1995
).
Shades of gray matter: Noninvasive optical images of human brain reponses during visual stimulation
.
Psychophysiology
,
32
,
505
509
.
Gratton
,
G.
, &
Fabiani
,
M.
(
2009
).
Fast optical signals: Principles, methods, and experimental results
. In
R.
Frostig
(Ed.),
In vivo optical imaging of brain
(2nd ed., pp.
435
460
).
Boca Raton, FL
:
CRC Press
.
Gratton
,
G.
, &
Fabiani
,
M.
(
2010
).
Fast optical imaging of human brain function
.
Frontiers in Human Neuroscience
,
4
,
Article 52
.
Hackett
,
T. A.
,
Stepniewska
,
I.
, &
Kaas
,
J. H.
(
1999
).
Prefrontal connections of the parabelt auditory cortex in macaque monkeys
.
Brain Research
,
817
,
45
58
.
Jacobsen
,
T.
, &
Schröger
,
E.
(
2001
).
Is there pre-attentive memory-based comparison of pitch?
Psychophysiology
,
38
,
723
727
.
January
,
D.
,
Trueswell
,
J. C.
, &
Thompson-Schill
,
S. L.
(
2009
).
Co-localization of stroop and syntactic ambiguity resolution in Broca's area: Implications for the neural basis of sentence processing
.
Journal of Cognitive Neuroscience
,
21
,
2434
2444
.
Jessen
,
S.
, &
Kotz
,
S. A.
(
2013
).
On the role of crossmodal prediction in audiovisual emotion perception
.
Frontiers in Human Neuroscience
,
7
,
Article 369
.
Kaas
,
J. H.
, &
Hackett
,
T. A.
(
2000
).
Subdivisions of auditory cortex and processing streams in primates
.
Proceedings of the National Academy of Sciences, U.S.A.
,
97
,
11793
11799
.
Keil
,
J.
,
Müller
,
N.
,
Ihssen
,
N.
, &
Weisz
,
N.
(
2012
).
On the variability of the McGurk effect: Audiovisual integration depends on prestimulus brain states
.
Cerebral Cortex
,
22
,
221
231
.
Kekoni
,
J.
,
Hämäläinen
,
H.
,
Saarinen
,
M.
,
Gröhn
,
J.
,
Reinikainen
,
K.
,
Lehtokoski
,
A.
, et al (
1997
).
Rate effect and mismatch responses in the somatosensory system: ERP-recordings in humans
.
Biological Psychology
,
46
,
125
142
.
Kimura
,
M.
,
Katayama
,
J. I.
, &
Murohashi
,
H.
(
2005
).
Positive difference in ERPs reflects independent processing of visual changes
.
Psychophysiology
,
42
,
369
379
.
Kimura
,
M.
,
Katayama
,
J. I.
,
Ohira
,
H.
, &
Schröger
,
E.
(
2009
).
Visual mismatch negativity: New evidence from the equiprobable paradigm
.
Psychophysiology
,
46
,
402
409
.
Koechlin
,
E.
,
Ody
,
C.
, &
Kouneiher
,
F.
(
2003
).
The architecture of cognitive control in the human prefrontal cortex
.
Science
,
302
,
1181
1185
.
McGurk
,
H.
, &
MacDonald
,
J.
(
1976
).
Hearing lips and seeing voices
.
Nature
,
264
,
746
748
.
Miller
,
L. M.
, &
D'Esposito
,
M.
(
2005
).
Perceptual fusion and stimulus coincidence in the cross-modal integration of speech
.
Journal of Neuroscience
,
25
,
5884
5893
.
Möttönen
,
R.
,
Krause
,
C. M.
,
Tiippana
,
K.
, &
Sams
,
M.
(
2002
).
Processing of changes in visual speech in the human auditory cortex
.
Cognitive Brain Research
,
13
,
417
425
.
Näätänen
,
R.
, &
Alho
,
K.
(
1997
).
Higher-order processes in auditory-change detection
.
Trends in Cognitive Sciences
,
1
,
44
45
.
Näätänen
,
R.
, &
Michie
,
P. T.
(
1979
).
Early selective-attention effects on the evoked potential: A critical review and reinterpretation
.
Biological Psychology
,
8
,
81
136
.
Näätänen
,
R.
,
Paavilainen
,
P.
,
Rinne
,
T.
, &
Alho
,
K.
(
2007
).
The mismatch negativity (MMN) in basic research of central auditory processing: A review
.
Clinical Neurophysiology
,
118
,
2544
2590
.
Nath
,
A. R.
, &
Beauchamp
,
M. S.
(
2011
).
Dynamic changes in superior temporal sulcus connectivity during perception of noisy audiovisual speech
.
Journal of Neuroscience
,
31
,
1704
1714
.
Nath
,
A. R.
, &
Beauchamp
,
M. S.
(
2012
).
A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion
.
Neuroimage
,
59
,
781
787
.
Nelson
,
J. K.
,
Reuter-Lorenz
,
P. A.
,
Persson
,
J.
,
Sylvester
,
C. Y. C.
, &
Jonides
,
J.
(
2009
).
Mapping interference resolution across task domains: A shared control process in left inferior frontal gyrus
.
Brain Research
,
1256
,
92
100
.
Oldfield
,
R. C.
(
1971
).
The assessment and analysis of handedness: The Edinburgh inventory
.
Neuropsychologia
,
9
,
97
113
.
Opitz
,
B.
,
Rinne
,
T.
,
Mecklinger
,
A.
,
Von Cramon
,
D. Y.
, &
Schröger
,
E.
(
2002
).
Differential contribution of frontal and temporal cortices to auditory change detection: fMRI and ERP results
.
Neuroimage
,
15
,
167
174
.
Rinne
,
T.
,
Gratton
,
G.
,
Fabiani
,
M.
,
Cowan
,
N.
,
Maclin
,
E.
,
Stinard
,
A.
, et al (
1999
).
Scalp-recorded optical signals make sound processing in the auditory cortex visible?
Neuroimage
,
10
,
620
624
.
Rockland
,
K. S.
, &
Ojima
,
H.
(
2003
).
Multisensory convergence in calcarine visual areas in macaque monkey
.
International Journal of Psychophysiology
,
50
,
19
26
.
Ross
,
L. A.
,
Saint-Amour
,
D.
,
Leavitt
,
V. M.
,
Javitt
,
D. C.
, &
Foxe
,
J. J.
(
2007
).
Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments
.
Cerebral Cortex
,
17
,
1147
1153
.
Sable
,
J. J.
,
Low
,
K. A.
,
Whalen
,
C. J.
,
Maclin
,
E. L.
,
Fabiani
,
M.
, &
Gratton
,
G.
(
2007
).
Optical imaging of temporal integration in human auditory cortex
.
European Journal of Neuroscience
,
25
,
298
306
.
Saint-Amour
,
D.
,
De Sanctis
,
P.
,
Molholm
,
S.
,
Ritter
,
W.
, &
Foxe
,
J. J.
(
2007
).
Seeing voices: High-density electrical mapping and source-analysis of the multisensory mismatch negativity evoked during the McGurk illusion
.
Neuropsychologia
,
45
,
587
597
.
Sams
,
M.
,
Aulanko
,
R.
,
Hämäläinen
,
M.
,
Hari
,
R.
,
Lounasmaa
,
O. V.
,
Lu
,
S. T.
, et al (
1991
).
Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex
.
Neuroscience Letters
,
127
,
141
145
.
Sams
,
M.
,
Möttönen
,
R.
, &
Sihvonen
,
T.
(
2005
).
Seeing and hearing others and oneself talk
.
Cognitive Brain Research
,
23
,
429
435
.
Schröger
,
E.
(
2007
).
Mismatch negativity: A microphone into auditory memory
.
Journal of Psychophysiology
,
21
,
138
.
Schröger
,
E.
, &
Wolff
,
C.
(
1996
).
Mismatch response of the human brain to changes in sound location
.
NeuroReport
,
7
,
3005
3008
.
Skipper
,
J. I.
,
Nusbaum
,
H. C.
, &
Small
,
S. L.
(
2005
).
Listening to talking faces: Motor cortical activation during speech perception
.
Neuroimage
,
25
,
76
89
.
Skipper
,
J. I.
,
van Wassenhove
,
V.
,
Nusbaum
,
H. C.
, &
Small
,
S. L.
(
2007
).
Hearing lips and seeing voices: How cortical areas supporting speech production mediate audiovisual speech perception
.
Cerebral Cortex
,
17
,
2387
2399
.
Sternberg
,
S.
(
1969
).
The discovery of processing stages: Extensions of Donders' method
.
Acta Psychologica
,
30
,
276
315
.
Sumby
,
W. H.
, &
Pollack
,
I.
(
1954
).
Visual contribution to speech intelligibility in noise
.
Journal of the Acoustical Society of America
,
26
,
212
215
.
Sundara
,
M.
,
Namasivayam
,
A. K.
, &
Chen
,
R.
(
2001
).
Observation–execution matching system for speech: A magnetic stimulation study
.
NeuroReport
,
12
,
1341
1344
.
Szycik
,
G. R.
,
Stadler
,
J.
,
Tempelmann
,
C.
, &
Münte
,
T. F.
(
2012
).
Examining the McGurk illusion using high-field 7 Tesla functional MRI
.
Frontiers in Human Neuroscience
,
6
,
Article 95
.
Talairach
,
J.
, &
Tournoux
,
P.
(
1988
).
Co-planar stereotaxic atlas of the human brain. 3-Dimensional proportional system: An approach to cerebral imaging
.
New York
:
Thieme
.
Tremblay
,
P.
, &
Gracco
,
V. L.
(
2009
).
Contribution of the pre-SMA to the production of words and non-speech oral motor gestures, as revealed by repetitive transcranial magnetic stimulation (rTMS)
.
Brain Research
,
1268
,
112
124
.
Tse
,
C. Y.
,
Gordon
,
B. A.
,
Fabiani
,
M.
, &
Gratton
,
G.
(
2010
).
Frequency analysis of the visual steady-state response measured with the fast optical signal in younger and older adults
.
Biological Psychology
,
85
,
79
89
.
Tse
,
C. Y.
,
Low
,
K. A.
,
Fabiani
,
M.
, &
Gratton
,
G.
(
2012
).
Rules rule! Brain activity dissociates the representations of stimulus contingencies with varying levels of complexity
.
Journal of Cognitive Neuroscience
,
24
,
1941
1959
.
Tse
,
C. Y.
, &
Penney
,
T. B.
(
2007
).
Preattentive change detection using the event-related optical signal—Optical imaging of cortical activity elicited by unattended temporal deviants
.
IEEE Engineering in Medicine and Biology Magazine
,
26
,
52
58
.
Tse
,
C. Y.
, &
Penney
,
T. B.
(
2008
).
On the functional role of temporal and frontal cortex activation in passive detection of auditory deviance
.
Neuroimage
,
41
,
1462
1470
.
Tse
,
C. Y.
,
Rinne
,
T.
,
Ng
,
K. K.
, &
Penney
,
T. B.
(
2013
).
The functional role of the frontal cortex in pre-attentive auditory change detection
.
Neuroimage
,
83
,
870
879
.
Tse
,
C. Y.
,
Tien
,
K. R.
, &
Penney
,
T. B.
(
2006
).
Event-related optical imaging reveals the temporal dynamics of right temporal and frontal cortex activation in pre-attentive change detection
.
Neuroimage
,
29
,
314
320
.
van Wassenhove
,
V.
,
Grant
,
K. W.
, &
Poeppel
,
D.
(
2005
).
Visual speech speeds up the neural processing of auditory speech
.
Proceedings of the National Academy of Sciences, U.S.A.
,
102
,
1181
1186
.
Whalen
,
C.
,
Maclin
,
E. L.
,
Fabiani
,
M.
, &
Gratton
,
G.
(
2008
).
Validation of a method for coregistering scalp recording locations with 3D structural MR images
.
Human Brain Mapping
,
29
,
1288
1301
.
Winkler
,
I.
(
2007
).
Interpreting the mismatch negativity
.
Journal of Psychophysiology
,
21
,
147
163
.
Winkler
,
I.
,
Horvath
,
J.
,
Weisz
,
J.
, &
Trejo
,
L. J.
(
2009
).
Deviance detection in congruent audiovisual speech: Evidence for implicit integrated audiovisual memory representations
.
Biological Psychology
,
82
,
281
292
.
Wolf
,
U.
,
Wolf
,
M.
,
Toronov
,
V.
,
Michalos
,
A.
,
Paunescu
,
L. A.
, &
Gratton
,
E.
(
2000
,
April
).
Detecting cerebral functional slow and fast signals by frequency-domain near-infrared spectroscopy using two different sensors
.
OSA Biomedical Topical Meetings Technical Digest
,
2000
,
427
429
.