Selective processing of task-relevant stimuli is critical for goal-directed behavior. We used electrocorticography to assess the spatio-temporal dynamics of cortical activation during a simple phonological target detection task, in which subjects press a button when a prespecified target syllable sound is heard. Simultaneous surface potential recordings during this task revealed a highly ordered temporal progression of high gamma (HG, 70–200 Hz) activity across the lateral hemisphere in less than 1 sec. The sequence demonstrated concurrent regional sensory processing of speech syllables in the posterior superior temporal gyrus (STG) and speech motor cortex, and then transitioned to sequential task-dependent processing from prefrontal cortex (PFC), to the final motor response in the hand sensorimotor cortex. STG activation was modestly enhanced for target over nontarget sounds, supporting a selective gain mechanism in early sensory processing, whereas PFC was entirely selective to targets, supporting its role in guiding response behavior. These results reveal that target detection is not a single cognitive event, but rather a process of progressive target selectivity that involves large-scale rapid parallel and serial processing in sensory, cognitive, and motor structures to support goal-directed human behavior.
Selective attention is a fundamental neural process that focuses cortical resources on salient information to support goal-directed behavior, often in the presence of competing environmental distractions. A classic method of probing selective attention is a target detection task, wherein a subject is instructed to respond when a prespecified (target) stimulus is encountered. Target selection is often treated as a unitary cognitive function. However, even this simple task requires successful performance of multiple distinct subprocesses including signal monitoring and detection, working memory, decision-making, behavioral planning, and response execution.
A promising approach for characterizing these components is to track the temporal cascade of neural signals in a spatially distributed cortical network. Information about the relative onset and duration of physiological activity across discrete regions can provide insight to the directionality of signal propagation, and can also highlight the functional roles of putative regions of interest (Miller & Wilson, 2008; Miller & D'Esposito, 2005).
Surgical epilepsy patients that are implanted with subdural electrodes for localization of seizure foci provide a rare setting to study the substrates of cognition in humans by directly recording neural data from the cortical surface (electrocorticography; ECoG). The few intracranial studies of attention or working memory using this approach have examined ERPs (Halgren et al., 1995) that contain phase-locked neural activity extracted by signal averaging. A wider temporal range of neural activity can be captured using time–frequency analysis of the local field potentials recorded in ECoG. Initial studies by Crone and others have demonstrated a highly robust, yet spatially and temporally discrete, marker of cortical neural activity in the high gamma (HG) spectral band (70–250 Hz) during auditory (Edwards, Soltani, Deouell, Berger, & Knight, 2005; Crone, Boatman, Gordon, & Hao, 2001), motor (Crone, Miglioretti, Gordon, & Lesser, 1998), and language processing (Sinai et al., 2005).
In the current study, we used ECoG to simultaneously track neural activity across multiple cortical areas during a phonological target detection task. The study was designed to sample from areas related to stimulus and response processing within the typical extent of electrode coverage over the lateral hemispheric cortical surface, including the superior temporal gyrus (STG; for speech sounds), lateral prefrontal cortex (PFC; for behavioral integration and executive control), and hand–sensorimotor cortex (for motor response). Subjects were presented with a sequence of speech syllables (e.g., /u/, /i/, /e/), and were instructed to press a button only when a prespecified target syllable was heard (e.g., /u/) (Figure 1A).
With this approach, we sought to answer some basic questions about the mechanisms of target detection. For example, when does target selectivity first appear during task behavior? What aspects of processing become exclusive to the target condition? Do temporal dynamics support concurrent prefrontal modulation of sensory areas, or does prefrontal activation occur after sensory processing? We found that focal increases in HG chronicled the evolution of cortical activity from early stimulus-driven processing in the pSTG and articulatory premotor cortex, to subsequent cognitive-control processing in PFC, and response execution in hand sensorimotor cortex. Target detection is therefore a process marked by parallel neural engagement in some local cortical areas, whereas activity between regions supporting task performance unfolded in a serial manner.
Four patients underwent surgical placement of a subdural electrode grid for clinical localization of seizure foci in the surgical work-up for intractable epilepsy. All recordings were made from the left hemisphere. All patients were right-handed and demonstrated left dominance for language on Wada intra-arterial sodium amytal testing. Research was approved by the UC San Francisco and UC Berkeley institutional review boards for human research.
Five syllables were used in this experiment, including two consonant–vowel sounds: /pa/ and /ba/; a semivowel: /wa/; and two vowels: /aa/ and /uu/. Syllables were taken from five recordings of each from four male speakers (for a total of 100 individual phonemes). Each stimulus was normalized by amplitude and duration (350 msec). The tokens were presented to subjects in 2 blocks of 500 stimuli, each lasting approximately 6.5 min.
The syllables were presented binaurally via open-field speakers in 2 blocks of 500 stimuli, each block lasting ∼6.5 min. Syllables were presented at ∼70–75 dB SPL via portable loudspeakers placed in front and below the patient's head at ∼50 cm distance. The stimulus onset asynchrony (onset-to-onset) was 775 ± 50 msec. The interstimulus interval (offset-to-onset) was 450 ± 50 msec (Figure 1A).
In the main target detection task (0-back), subjects were instructed to press a button when they heard a prespecified target stimulus. In one subject (GP1), passive and working memory conditions were added. In the passive condition, the patients were instructed to ignore the syllables and look at a slide presentation of photographs. In the working memory condition, the subject was instructed to respond when the presented syllable was the same as the syllable was presented two trials earlier (2-back task). Accuracy and response time were recorded during the task. RT differences between conditions were assessed with a simple independent-samples t test. All subjects underwent a training session before recording to ensure they understood the task.
A self-paced articulation task was done as an experimental control to determine sites related to speech motor production. Subjects were instructed to orally repeat a consonant–vowel syllable, /pa/, or a vowel sound /a/ or /e/, 30 times each.
The electrodes are embedded in clear silastic in an 8 × 8 array with 1 cm center-to-center spacing (Ad-tech Corp., Racine, WI). The electrodes are platinum–iridium disks with 2.3 mm contact diameter (impedance ∼1–5 kW). ECoG was amplified (SA Instrumentation, San Diego, CA) with a gain of 10,000 and a filter bandpass of 0.01–1000 Hz. Data were digitized at 2003 Hz using Datapac 2000 software (RUN Technologies, Mission Viejo, CA).
Data were referenced to the common average (CAR) (Crone et al., 2001). The CAR consists of the mean across all included electrodes, using only nonrejected sample points (rejected for electrode artifacts). The CAR approximates the activity contributed by the original reference, thus subtracting the CAR from each channel largely removes contributions of the original reference electrode.
All data were imported into MATLAB (MathWorks, Natick, MA) for processing. Artifact rejection began with an automatic procedure to detect amplifier saturation or excessive power across all frequency bands indicating a transient artifact. The raw signal was also checked by visual inspection, and additional rejections were added manually using EEGLAB (Delorme & Makeig, 2004).
Time–frequency analyses used a Gaussian filter bank and the Hilbert transform on each channel separately (Edwards et al., 2005). There is one Gaussian filter (Gaussian-shaped window in the frequency domain) in the filter bank for each of 42 center frequencies (cfs) from 4 to 250 Hz. For each cf, the result is a time series of analytic amplitude (AA) of the same sampling rate (2003 Hz), duration (∼7 min), and units (mV) as the original ECoG recording. The AA is a formal means of obtaining the envelope of a signal. The envelopes of the frequency bands from 70 to 160 Hz are averaged together to form the HG AA used in all results shown. Although the full HG band ranges from ∼60 to 300 Hz, only the range from 70 to 160 Hz was used in order to avoid electrical artifacts at 60 and 180 Hz, and because this is the frequency range of maximal response. All single-trial data (Figure 1D is shown in units of z-score, calculated by subtracting the baseline mean and dividing by the baseline standard deviation). All other results (Figure 1E) are shown in units of percentage (%) changes from the baseline mean. The baseline for all analyses was −250 to 0 msec prestimulus.
Event-related averages of HG AA are taken relative to stimulus and response onsets. In order to show the sensory-locked and motor-locked behavior on one time line, the single trials are realigned to the median RT prior to averaging. The realignment procedure only affects the epoch from 500 msec poststimulus to 250 msec preresponse, which is essentially expanded/contracted to match the median RT. The data before this interval are average-locked to the stimulus, and the data after are average-locked to the response.
To assess if a change in AA is significantly different from baseline, we used a bootstrap resampling method. Significant latencies are indicated directly on the plots (Figure 1D) by lines over or under the trace. To assess if AA changes are significantly different between conditions, we used a permutation test and significant latencies are indicated on the plots (Figure 1D) with pink lines between the two conditions. Raw p values are corrected for multiple comparisons using the false discovery rate (FDR) approach (Benjamini & Hochberg, 1995), and these corrected p values are used in all cases. Full details for statistical analyses are included in the Supplemental Materials posted on-line at http://bil.ucsf.edu/targetdetection.
Localization of implanted electrodes was conducted using a photograph–MRI–radiograph coregistration technique, described previously (Dalal et al., 2008). A high-resolution T1 MRI scan was obtained preoperatively for each patient as part of the clinical evaluation of epilepsy. 3-D MRI reconstruction was made using BET2 (www.fmrib.ox.ac.uk/analysis/research/bet) and MRIcro (www.sph.sc.edu/comd/rorden/mricro.html). Localizations were additionally constrained by the requirement to be on the cortical surface and by the known 1-cm interelectrode spacing. MNI coordinates were obtained for all electrodes using SPM2. Known control points are used to compute projective transforms that link the different image sets to refine the location of manually registered visible electrodes. The final result is a set of electrode positions on the patient's rendered MRI yielding locations relative to sulcal and gyral landmarks on individual anatomy.
All four subjects performed the phonological target detection task (Figure 1A) with >97% accuracy, and the median RT across all subjects was 674 msec (SD ±254 msec).
Time Course of Cortical Activation
The time series of the single-trial and average analytic amplitude (envelope) of HG for electrodes with significant activation are shown in Figure 1C and D, respectively. HG first robustly increased with short latency in the STG after stimulus onset (onset 55 msec, peak 120 msec). Event-related HG activity in the STG was characterized by a sharp rise after stimulus onset, and typically had a slow return to baseline over hundreds of milliseconds.
In the dorsal aspect of the STG, overlying the Sylvian fissure, and directly adjacent to the lateral planum temporale (PT), HG activation was equal in response to target and nontarget sounds. In contrast, in the ventral aspect of the STG, near the superior temporal sulcus, HG activation was modestly greater in response to target (red) compared to nontarget (blue) stimuli. In Figure 1D, the significantly different time points in HG activation between target and nontarget conditions are shown by the pink shading.
Soon after the onset of STG activation, HG was found in the precentral gyrus corresponding to superior ventral premotor cortex (sPMv) (onset 100–110 msec, peak 120–200 msec). Unlike pSTG activation, which peaked quickly, the sPMv response had a more symmetric temporal distribution (i.e., less kurtosis compared to the pSTG responses), suggesting that activation there was more likely induced than stimulus evoked. Similar to the STG, greater activation was observed in the sPMv during target compared to nontarget trials.
The same sPMv sites were also activated during articulation of phonemes (e.g., /a/, /pa/, /i/), confirming the functional colocalization to a speech motor area. During articulation, the onset of sPMv activation was observed prior to vocalization, and then followed by activation of the STG, representing auditory feedback (Figure 2).
For all subjects, the remaining cortical event-related HG activations occurred only during target trials. Cortical sites over lateral PFC demonstrated a significant increase in HG with delayed latencies (onset 180–320 msec, peak 300–600 msec). HG activation was observed only during responses to target stimuli. PFC response had duration from 100 to 500 msec and engaged dorso- and ventrolateral subareas. Inspection of single trials (Figure 1C) sorted by RT revealed that PFC sites were more temporally aligned with the response than stimulus. The delayed timing and target-selective responsiveness of PFC represents the decision processes and initiation of motor planning and selection necessary for the button press.
Next, in the two subjects with adequate posterior coverage, HG was observed in the inferior parietal lobe (IPL) (onset 400–500 msec, peak 100–30 msec before button press response) (Figure 3; one other subject is provided in the Supplemental Materials on-line at http://bil.ucsf.edu/targetdetection). The latencies of peak activation were better locked to the button press than for the speech stimulus in the single-trial plots.
The next site of activation was on the precentral gyrus (PrCG), which was located one electrode dorsal to speech motor cortex (onset was 200–300 msec before button press, peak 100 msec after button press; Figure 1B and D). These sites were hand movement-related, as determined by low-threshold (3 mA) responses evoked by bipolar electrical cortical stimulation used for clinical mapping. Finally, a thumb somatosensory response was recorded over the postcentral gyrus (PoCG) (onset 10 msec, peak 120–180 msec after button press). In addition, a separate inferior ventral premotor cortex (iPMv) site was observed during both the pre- and postresponse periods, (i.e., overlapping both the motor and sensory parts of the response).
Working Memory Condition
One subject was able to carry out a working memory task in which she was instructed to respond when the (target) syllable was the same as the one presented two trials previously (also called a 2-back procedure). This condition requires continuous updating and active manipulation of the target representation, in contrast to our previous target detection task in which the target stimulus was prespecified (0-back) and subject to rapid and easy template matching. The stimulus sequence (pseudorandom presentation) and the behavioral response (button press) were the same across both conditions, and only the task rules were changed.
In Figure 4, the HG time series for all electrodes was superimposed over the positions of the grid electrodes. The extent of coverage for this array spans from ventrolateral PFC to the inferior parietal area, although it is mainly centered over the lower half of the central sulcus and the length of the Sylvian fissure. The RTs during the 2-back condition were significantly (p < .001) longer than those during the 0-back condition [median RT for 2-back (orange) = 748 msec, 0-back (red) = 618 msec]. In PFC, the 2-back target responses had increased amplitude and peak latency in the electrodes that showed a response to 0-back targets. Seven PFC electrodes not active in the 0-back showed selective activation in response to the 2-back targets. Responses in the iPMv and PoCG had longer latency but the same amplitude in the 2-back condition compared to the 0-back condition. These findings suggest a strong dependency of PFC in working memory.
A passive listening condition (purple) was also added to determine the attentional changes related to the detection tasks. In both the pSTG and sPMv, active behavioral conditions (0-back and 2-back) had greater activation compared to the passive condition, suggesting a general attentional effect on sensory processing that was not task specific in those areas. Nontarget trials are shown in Figure 5.
We have demonstrated the spatio-temporal sequence of human cortical activation during phonological target detection. We tracked real-time neural activity across a large cortical network in the left hemisphere for a simple behavioral task that is typically completed in less than 1 sec—while functionally dissecting the underlying physiology of target from nontarget circuitry. The temporal sequence of activation revealed novel aspects of the serial and parallel nature of the cortical dynamics underlying sensory processing, response planning, and motor execution.
At the start, auditory responses are equally demonstrated in the lateral posterior PT. The pSTG was found to be more responsive to speech syllables but also sensitive to the selective attentional demands of directed behavior. This lateral area has been previously shown to be more selective to speech compared to tones or other complex acoustic sounds (Edwards et al., 2009; Boatman & Miglioretti, 2005; Edwards et al., 2005; Boatman, Hall, Goldstein, Lesser, & Gordon, 1997). The target-selective enhancement of pSTG activity was present at the response onset, suggesting that top–down effects of attention are implicated at this higher level of sensory processing specific for speech sounds.
Prior studies have shown that auditory responses can be modulated by attention in single cortical neurons (Hubel, Henson, Rupert, & Galambos, 1959) and also in human scalp ERPs (Hillyard, Hink, Schwent, & Picton, 1973). Prior studies have also demonstrated that the receptive fields of neurons in auditory cortex (Fritz, Shamma, Elhilali, & Klein, 2003; Hubel et al., 1959) and somatosensory cortex (Ray, Hsiao, Crone, Franaszczuk, & Niebur, 2008) can be dynamically tuned for selective features depending on behavioral needs. Differential filtering as a result of rapid short-term synaptic plasticity could underlie the increased response gain for target stimuli (Zucker & Regehr, 2002).
The propagation of information originating from the pSTG followed two information processing streams in the early phase of phonological target detection. The first is rapid connectivity with a speech-specific motor module in the sPMv. This result adds to increasing evidence that motor cortex actively participates in sublexical speech perception, perhaps as pSTG accesses the articulatory network to compare externally driven auditory representations with internal motor representations.
The second stream is a delayed nonspatial cognitive auditory stream to PFC (“ventral stream” or auditory “what” pathway; Romanski, Tian, et al., 1999). In contrast to earlier sensory processing mechanisms, PFC was primarily activated only during target trials. This is consistent with observations that PFC neurons do not encode all stimuli, but rather process the decision or outcome in auditory (Russ, Orr, & Cohen, 2008), visual (Freedman, Riesenhuber, Poggio, & Miller, 2003; Kim & Shadlen, 1999; Rainer, Asaad, & Miller, 1998), and somatosensory detection tasks (de Lafuente & Romo, 2005). PFC activity was further increased and more sites were recruited when a 2-back working memory condition was added, which was not the case for the early sensory and later motor areas.
This load-dependent enhancement could be interpreted to represent executive processing, storage, or rehearsal functions involved in verbal working memory. We did not observe the sustained PFC neuronal activity that has been described as a memory trace in visual tasks (Cohen et al., 1997; Fuster & Alexander, 1971), although no such analog in auditory or verbal processing has previously been demonstrated. PFC changes were also unlikely to support subvocal rehearsal functions as the premotor areas for articulation were not implicated at that delayed phase of processing and were not task-dependent.
The results support the hypothesis that PFC is actively involved in the “behavioral” aspects of processing the goals and rules of a given task as well as initiating the output mechanisms needed to perform a given task (D'Esposito, 2007; Miller & Cohen, 2001). Taken together, our findings are supportive of single-unit findings in nonhuman primates demonstrating that PFC is adaptively responsive to relevant changes in the environment. Indeed, a hallmark PFC damage is increased distractability, resulting in inappropriate updating of environmental events (Chao & Knight, 1998).
A recent study using both fMRI and diffusion tensor imaging has provided evidence for anatomic structures supporting a dorsal-based articulatory pathway and a ventral-based comprehension or cognition pathway for speech. The dorsal connections from the pSTG to the sPMv are likely mediated through the arcuate and superior longitudinal fasciculus, whereas the ventral connections from the pSTG to ventrolateral PFC occur via the extreme capsule (Saur et al., 2008; Romanski, Tian, et al., 1999). Prior studies in nonhuman primates, also using physiologically defined rostral auditory belt sites in the pSTG, have shown specific anterograde tracer connections to PFC areas 12 (inferior convexity) and 45 (part of Broca's area) (Romanski, Bates, & Goldman-Rakic, 1999; Romanski, Tian, et al., 1999).
These findings delineate new aspects of planning and execution in human motor cortical processing. In this task, hand use in the button press was associated with PFC and IPL activation at middle-to-late latencies. This was followed by motor cortex activity and then somatosensory cortex activity, both related to hand movement and locked to the timing of the button press response. Furthermore, we observed separate hand-related iPMv activity that spanned the motor and sensory phases of the responses. This area has not been well characterized in humans, but by analogous location and functional properties, it appears to correlate to F5 in nonhuman primates (Rizzolatti & Luppino, 2001). iPMv receives dense input from prefrontal and parietal cortices, and has direct output to primary motor cortex. This region appears to have a supervisory role in planning behavioral responses and monitoring the consequences of the outcomes.
Target detection has been extensively associated with the scalp EEG-derived P300 event-related potential (Polich, 2007; Sutton, Braren, Zubin, & John, 1965). Despite the simplicity of the task and reliability of the P300, it is still unclear how and where it is produced. One question in the P300 literature is whether the various components (P3a, P3b, etc.) are to be considered more sensory, cognitive, or response-related. We show activations related to all three stages expressed over space and time, demonstrating that target detection is not a unitary cognitive event. For example, much of the target-detection activation appears related to response preparation, and this is observed in regions (peri-Rolandic cortices) that make sense for motor preparation. These findings are exploratory, but they do provide a reliable contribution to the spatio-temporal imaging of target detection, which is a classic and fundamental part of many, perhaps most, psychological and cognitive experimental designs.
During nontarget conditions, only the sensory-evoked potentials (i.e., N100) are obtain after signal averaging. During target conditions, attentional resources are allocated to the target such that a P300 potential is generated in addition to the sensory-evoked potentials. It is hypothesized to index attentional resources, such that when task conditions are undemanding, amplitude is enhanced and peak latency is shortened. From our single-trial results, it can be seen why response averaging during easy task conditions leads to a high-amplitude, short-latency P300 as the activity is well time-locked. In contrast, as shown by our 2-back findings, more difficult tasks engage more frontal areas and peak at longer latencies as there is greater variability in the timing of cognitive events. However, it is difficult to further speculate because the P300 integrates much larger sources at the level of the scalp and we were unable to perform simultaneous EEG and ECoG recordings due to technical constraints related to head bandages.
A distinct experimental advantage of ECoG is the ability to record from multiple sites simultaneously in real-time, in contrast to some of the sampling limitations of single-unit recordings and the temporal constraints of fMRI. Nonetheless, ECoG in this experiment also had specific limitations. First, the extent of grid coverage in humans was guided by the clinical indications for their epilepsy localization. In some cases, the standard grid did not cover both the frontal and parietal areas. Tasks such as target detection also likely involve more brain areas, including sites in the nondominant hemisphere, medial frontal cortex, hippocampus, and cerebellum. Second, the electrode contacts are limited to the gyral cortical surface, and therefore, do not effectively sample the intrasulcal and subcortical areas of potential interest. Third, the electrode contacts themselves are likely representing the population responses of thousands of neurons, thus it is difficult to extrapolate more spatially discrete processing occurring in cortical microcircuits with the currently available grid electrodes.
Despite these limitations, in this article, we focused on HG oscillations to track spatio-temporal pattern of activation related to target detection. The exact sources for HG oscillations are actively under investigation, although recent studies have demonstrated a close correspondence to interneuron multiunit activity recorded at mid-laminar depths (Ray et al., 2008; Steinschneider, Fishman, & Arezzo, 2008). Several reports have observed that HG tracks auditory (Crone et al., 2001), motor (Miller et al., 2007), and language (Canolty et al., 2007) processing, and interacts with functional brain rhythms such as theta. Here we extend these results to the higher-order cognitive processes occurring in multiple cortical areas during target detection.
Our findings confirm that HG can effectively track the cascade of activations relevant for a host of cognitive operations and reveal an orderly extraction of sensory information, decision-making, and behavioral output in human cortex.
This research was supported by NIH grants NS21135, PO4813 (R. T. K.), F32NS061552, K99NS065120 (E. C.), F32NS061616 (E. E.), and F31DC004855 (S. S. D.).
Reprint requests should be sent to Edward F. Chang, Department of Neurological Surgery, University of California, San Francisco, 505 Parnassus Avenue, M779, San Francisco, CA 94143, or via e-mail: ChangEd@neurosurg.ucsf.edu.
Contributed equally to this work.