Abstract

Sensitivity to temporal change places fundamental limits on object processing in the visual system. An emerging consensus from the behavioral and neuroimaging literature suggests that temporal resolution differs substantially for stimuli of different complexity and for brain areas at different levels of the cortical hierarchy. Here, we used steady-state visually evoked potentials to directly measure three fundamental parameters that characterize the underlying neural response to text and face images: temporal resolution, peak temporal frequency, and response latency. We presented full-screen images of text or a human face, alternated with a scrambled image, at temporal frequencies between 1 and 12 Hz. These images elicited a robust response at the first harmonic that showed differential tuning, scalp topography, and delay for the text and face images. Face-selective responses were maximal at 4 Hz, but text-selective responses, by contrast, were maximal at 1 Hz. The topography of the text image response was strongly left-lateralized at higher stimulation rates, whereas the response to the face image was slightly right-lateralized but nearly bilateral at all frequencies. Both text and face images elicited steady-state activity at more than one apparent latency; we observed early (141–160 msec) and late (>250 msec) text- and face-selective responses. These differences in temporal tuning profiles are likely to reflect differences in the nature of the computations performed by word- and face-selective cortex. Despite the close proximity of word- and face-selective regions on the cortical surface, our measurements demonstrate substantial differences in the temporal dynamics of word- versus face-selective responses.

INTRODUCTION

Neurons in visual cortex are tuned to a myriad of features of the visual stimulus ranging from simple image statistics, such as spatial frequency, orientation, and disparity (De Valois, Albrecht, & Thorell, 1982; Barlow, Blakemore, & Pettigrew, 1967; Hubel & Wiesel, 1962), to dynamic properties, such as stimulus duration and direction of motion (Movshon, Thompson, & Tolhurst, 1978; Hubel & Wiesel, 1965), to high-level features, such as semantic similarity and category membership (Grill-Spector & Weiner, 2014; Huth, Nishimoto, Vu, & Gallant, 2012; Kanwisher, McDermott, & Chun, 1997). Regions of visual cortex that are sensitive to particular visual categories, such as the fusiform face area (FFA), which responds selectively to faces, and the visual word form area, which responds selectively to words, are believed to perform computations that are critical for the perception of these stimulus classes (Grill-Spector & Weiner, 2014; Wandell, Rauschecker, & Yeatman, 2012; Cohen et al., 2002; Kanwisher et al., 1997). For example, disruption of signals in the FFA through electrical stimulation impairs face perception (Jonas et al., 2012; Parvizi et al., 2012), and lesions in the vicinity of the visual word form area impair the ability to rapidly recognize words (a condition known as pure alexia or word blindness; Gaillard et al., 2006; Dejerine, 1891).

Despite the striking sensitivity of these ventral occipitotemporal regions to category membership, low-level features of the visual stimulus still influence neural responses. Understanding the low-level stimulus features that drive responses in ventral occipitotemporal cortex has helped elucidate fundamental aspects of visual computation and perception. For example, spatial tuning, one of the most extensively studied properties of neurons in visual cortex, has been fundamental for understanding differences in the computations performed by different visual regions and linking computation to perceptual function. Ventral stream regions that are important for the perception of objects, including words and faces, predominantly receive inputs from the foveal representations of early visual areas, and consequently the responses of these regions are principally driven by stimuli in the center of the visual field (Hasson, Levy, Behrmann, Hendler, & Malach, 2002; Levy, Hasson, Avidan, Hendler, & Malach, 2001). This foveal bias is believed to underlie our poor perceptual performance for objects in the periphery. For example, word recognition in the periphery is substantially slower and less accurate than would be predicted by visual acuity alone (Chung, Mansfield, & Legge, 1998).

The temporal properties of the visual system also impose fundamental limits on cortical computations but have received far less attention than spatial properties. Temporal tuning properties of the visual system can be characterized by three fundamental parameters: (1) temporal resolution or temporal acuity (i.e., the highest temporal frequency that elicits a response to a given visual feature), (2) the temporal frequency that elicits the maximal response to that feature, and (3) the delay of the response with respect to the stimulus onset (latency). The fastest rate at which neurons can track changes in a stimulus is related to the integration time of the system: Neurons that integrate over long time periods effectively low-pass filter their inputs and have low temporal acuity/resolution.

In the case of simple features such as luminance and contrast, temporal resolution is very high (Kelly, 1961a, 1961b), but for more complex features and objects, temporal resolution is much lower (Holcombe, 2009; McMains & Somers, 2004; Battelli, Cavanagh, Martini, & Barton, 2003; Potter & Faulconer, 1975). A parallel temporal hierarchy has also been observed as one progresses from early visual cortex to extrastriate areas in the temporal lobe. Early PET measurements in striate cortex indicated that peak responses to reversing checkerboards occurred between 4 and 15 Hz and similar tuning was observed using fMRI (Thomas & Menon, 1998; Zhu et al., 1998; Kwong et al., 1992) with a consensus that peak responses occur near 8 Hz (but see Ozus et al., 2001, which reported that the peak response plateaus at 6 Hz). Temporal integration of more complex information present in natural object images was first reported to differ between early retinotopic cortex and higher-order occipitotemporal areas by Mukamel, Harel, Hendler, and Malach (2004). Using fMRI, they found that, although activation increased by 200% in early visual cortex for presentation rates between 1 and 4 Hz, the increase was only 25% in occipitotemporal cortex. The difference was attributed to differences in integration time among areas that are at different stages of the visual hierarchy. Later work (McKeeff, Remus, & Tong, 2007) compared temporal tuning profiles over both retinotopic visual areas and occipitotemporal areas that were selectively responsive to face images (FFA) or house images (parahippocampal place area[PPA]). They found that maximal activation occurred around 18 Hz in early visual areas V1–V3, at ∼9 Hz in V4, but at only 4–5 Hz in FFA and PPA. In a complementary study (Hasson, Yang, Vallines, Heeger, & Rubin, 2008), silent films were temporally scrambled by cutting them into time segments of varying duration and randomizing the order of presentation. Activation in later visual areas was maximal for longer segments, suggesting that high-level areas integrate information over long time periods. Gauthier, Eger, Hesselmann, Giraud, and Kleinschmidt (2012) alternated a single face image with a single house image using rates between 1.2 and 10 Hz. They found a progressive decrease in the optimal frequency of presentation going from V1 to the lateral occipital complex to FFA and PPA. What is clear from this collection of studies is that temporal response properties slow down at higher stages in the visual system and that these response properties place fundamental constraints on perception. This suggests a simple hypothesis: Responses to stimuli (or features) represented at similar levels of the visual hierarchy will have similar temporal dynamics.

Here we use evoked potential measures of temporal processing as a means to compare the temporal limits of word- and face-selective cortex. This choice is motivated by the fact that word- and face-selective regions are immediately adjacent (within a few millimeters) on the ventral surface (Yeatman, Rauschecker, & Wandell, 2013; Wandell et al., 2012; Dehaene et al., 2010). We therefore might expect these regions to share equivalent tuning properties even though the computations required to read a word (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001; Seidenberg & McClelland, 1989) are certainly very different from the computations required to recognize a face (Meyers, Borzello, Freiwald, & Tsao, 2015). In support of the hypothesis that there is a canonical temporal processing profile in adjacent category-selective regions, both words and faces produce a characteristic ERP at comparable latencies (150–170 msec) after the presentation of the visual stimuli (Maurer, Brandeis, & McCandliss, 2005; Bentin, Mouchetant-Rostaing, Giard, Echallier, & Pernier, 1999; Bentin, Allison, Puce, Perez, & McCarthy, 1996). Although the N150–N170 for words and faces each have distinct scalp topographies (Rossion, Joyce, Cottrell, & Tarr, 2003), the temporal similarity between their ERP responses could be hypothesized to reflect consistent temporal tuning properties of neurons across ventral temporal cortex: If one makes the assumption that the ERP is equivalent to the impulse response of a linear system, then one would predict that the temporal tuning of faces and text should be very similar, given the similarity in the latency of the selective activity in the two tasks. An alternative hypothesis is that temporal response properties depend substantially on the specific nature of the computations that the visual system performs on different categories of stimuli, such as words and faces.

This study uses steady-state visually evoked potentials (SSVEPs) to test the hypothesis that there are canonical temporal response properties for regions at the same level of the visual hierarchy (for a recent review of the SSVEP approach, see Norcia, Appelbaum, Ales, Cottereau, & Rossion, 2015). Using the SSVEP, we assessed the temporal frequency tuning preference, the temporal resolution, and the apparent latency of word- and face-selective cortex. Despite the similarity of the N170 response to words and faces, we find markedly distinct temporal properties for the two categories of stimuli.

METHODS

Participants

Eleven adults (four women) between the ages of 18 and 56 years participated. They had normal visual acuity and were screened for neurological and cognitive impairments. Each participant provided written informed consent under a protocol that conformed to the tenets of the Declaration of Helsinki that was approved by the institutional review board of Stanford University.

Stimuli

The text image comprised a block of common English words derived from the MCWord database (www.neuro.mcw.edu/mcword/). The face image comprised a black and white photograph of a cropped female head and face placed on a random texture background. Images extended 12° in each direction from a fixation cross in the center of the screen. To provide a comparison stimulus with the same low- and mid-level image statistics, each image was scrambled using the algorithm developed by Portilla and Simoncelli (2000), which is available at www.cns.nyu.edu/∼lcv/texture/. The algorithm learns the joint distribution of filter locations, orientations, and scales from the image (separate distributions were computed for the text and face images) and preserves this histogram in the synthesized, scrambled version. Stimuli are shown in Figure 1.

Figure 1. 

Stimuli. Image sequences consisted of periodic alternations of an intact image (text or face) with a scrambled image whose lower-level statistics were equated to the text or face image, respectively. Image sequences were presented at seven stimulus frequencies spanning 1–12 Hz.

Figure 1. 

Stimuli. Image sequences consisted of periodic alternations of an intact image (text or face) with a scrambled image whose lower-level statistics were equated to the text or face image, respectively. Image sequences were presented at seven stimulus frequencies spanning 1–12 Hz.

Intact and scrambled versions of the stimuli were presented in temporal alternation at rates of seven frequencies: 1, 2, 3, 4, 6, 9, and 12 Hz. These frequencies were chosen based on prior SSVEP work on faces (Alonso-Prieto, Belle, Liu-Shuang, Norcia, & Rossion, 2013) because we expected (a) the amplitude of the odd harmonic to drop close to the noise floor by 12 Hz and (b) more rapid changes in amplitude as a function of frequency at low compare to high frequencies motivating a more dense sampling of lower frequencies (1–6 Hz). Observers were given a fixation mark in the center of the image and were instructed to hold their fixation on the mark and to refrain from blinking. The image sequences were presented for 12 sec, with the first and last seconds being excluded from the analysis of the SSVEP. Five trials were run for each temporal frequency and image type with the stimuli presented in random order.

EEG Recording and SSVEP Analysis

EEG was recorded over 128 channels at a sampling rate of 500 Hz using HydroCell SensorNets (Electrical Geodesics Inc., Eugene, OR) connected to an Electrical Geodesics NetAmp 300 running NetStation 4.3 software. Data analysis was performed offline using in-house software after exporting the data and digital bandpass filtering between 0.3 and 200 Hz.

The SSVEP was extracted from the individual 10-sec trials by first calculating a time average of five 2-sec consecutive bins of the original 10-sec trial record, yielding a spectrum resolution of 0.5 Hz. The Fourier coefficients at the first harmonics were then averaged coherently to determine the amplitude and phase of the response for each stimulus condition for each participant. Previous work (Liu-Shuang, Ales, Rossion, & Norcia, 2015) has shown that the first harmonic of the SSVEP to alternations between intact and scrambled natural images is generated predominantly by responses to the higher-order configural information in the natural image. An estimate of SSVEP delay (d) with respect to the stimulus was calculated from the slope of the phase versus frequency function (Lopes da Silva, van Rotterdam, Storm van Leeuwen, & Tielen, 1970),
formula
where δϕ is the change in phase over the frequency range in degrees and δf is the change in frequency in Hz. This relationship is derived from the properties of physical systems that are “causal” or those whose output can only come after rather than before the input. In such systems, the real and imaginary components are tied together via the Kramers–Kronig relationship—knowing the real component at a given frequency implies knowing the imaginary component and a related relationship discovered by Bode in 1937 that ties the shape of the system gain/amplitude versus frequency function to the slope of the phase versus frequency function (see Bechoefer, 2011, for a review).

RESULTS

Word- and Face-selective Responses Have Different Temporal Tuning Curves

We find that word- and face-selective responses each have a unique temporal tuning curve, preferred stimulus frequency, and scalp topography (Figure 2). Word-selective cortex shows a peak response to text presented at 1 Hz, and the amplitude of the response declines monotonically as a function of presentation frequency. The brain no longer tracks the change from scrambled to intact text at presentation frequencies above 9 Hz. Face-selective cortex shows a peak response to faces presented at 4 Hz, and the amplitude of the response declines for slower or faster presentation frequencies. For faces, the response is equivalent for 1-Hz and 6-Hz presentation rates. Both word- and face-selective regions show equivalent and minimal responses to stimuli presented at 9 Hz. The left-lateralized scalp topography for words goes from being nearly equal for the two hemispheres at 1 Hz to being strongly left-lateralized at 4 Hz. The right hemisphere word response declines more rapidly as a function of presentation rate than the left hemisphere word-response (Figure 2). By contrast, the response to the face image is almost equal for both hemispheres (with a slight right hemisphere preference) at all frequencies where it is measurable, and there is not a substantial change in lateralization at different presentation rates.

Figure 2. 

Temporal tuning of word- and face-selective cortex. Top: Scalp topography of the first harmonic response as a function of temporal frequency for text and face image sequences. Dashed lines indicate ROIs identified on the basis of amplitude maxima in the group average maps. The maximum response to words was at electrode 65, and the maximum response to faces was at electrode 83. The results are very similar for electrode 90 (right hemisphere homologue of electrode 65). Bottom: Temporal frequency tuning functions for text (gray curves) and face (black curves) stimuli for left (ROI 1) and right (ROI 2) hemisphere ROIs. Face image tuning functions peaked at 4 Hz, and text image functions were maximal at 1 Hz. Face-imaging tuning function extends to higher temporal frequencies than do text image functions.

Figure 2. 

Temporal tuning of word- and face-selective cortex. Top: Scalp topography of the first harmonic response as a function of temporal frequency for text and face image sequences. Dashed lines indicate ROIs identified on the basis of amplitude maxima in the group average maps. The maximum response to words was at electrode 65, and the maximum response to faces was at electrode 83. The results are very similar for electrode 90 (right hemisphere homologue of electrode 65). Bottom: Temporal frequency tuning functions for text (gray curves) and face (black curves) stimuli for left (ROI 1) and right (ROI 2) hemisphere ROIs. Face image tuning functions peaked at 4 Hz, and text image functions were maximal at 1 Hz. Face-imaging tuning function extends to higher temporal frequencies than do text image functions.

Latency Topography Demonstrates Two Distinct Sources at Two Different Times

By comparing SSVEP phase values across temporal frequencies, we derived latency estimates for responses to face and word images (see Equation 1). In a linear time-invariant system, there is a linear relationship between the phase and frequency of a signal. This linear relationship indicates that all frequencies are delayed by the same constant amount (constant group delay). Consistent with the underlying model assumption of a linear time-invariant system, the phase versus frequency functions are linear for text and face stimuli. They differ, however, in slope, with the inferred delay differing by region and stimulus category. By mapping delay over the sensor array, it is apparent that both words and faces show two distinct latencies (Figure 3). This observation suggests the existence of at least two different underlying sources. In the occipitotemporal ROIs, the shortest delay for text is 140.0 ± 6.6 msec but is 159.4 ± 3.0 msec for the face image. A longer latency source is apparent over left occipitotemporal cortex for the text stimuli with a latency of 257.6 ± 7.1 msec. For the face stimuli, longer latency activity is present over right anterior temporal cortex at a latency of 287.9 ± 11.8 msec.

Figure 3. 

Implicit time for text and face responses. Top: Response latency in milliseconds is color-coded with cool colors indicating shorter latencies and warm colors longer latencies (see color bar on right). The maps were thresholded to exclude channels where more than one data point was unreliable due to the lack of statistically significant SSVEP responses. Both (A) text and (B) face maps contain regions with more than one delay. Bottom: SSVEP phase versus frequency plots for text (left) and face (right images) for selected ROIs indicated by dashed circles. Linear regression fits to the phase versus frequency function are indicated by the solid lines, with the corresponding estimates of latency ±1 SEM.

Figure 3. 

Implicit time for text and face responses. Top: Response latency in milliseconds is color-coded with cool colors indicating shorter latencies and warm colors longer latencies (see color bar on right). The maps were thresholded to exclude channels where more than one data point was unreliable due to the lack of statistically significant SSVEP responses. Both (A) text and (B) face maps contain regions with more than one delay. Bottom: SSVEP phase versus frequency plots for text (left) and face (right images) for selected ROIs indicated by dashed circles. Linear regression fits to the phase versus frequency function are indicated by the solid lines, with the corresponding estimates of latency ±1 SEM.

DISCUSSION

By measuring both the amplitude and phase of the SSVEP as a function of temporal frequency, we derive a richer description of the dynamics of word and face processing than has been possible with traditional ERP measurements, PET, or fMRI. From our measurements, we determined that temporal acuity, peak response frequency, and delay each differ for text and face images. These differences in temporal tuning profiles might be surprising considering (a) word- and face-selective ERPs have been described to have a similar time delay (Cao, Jiang, Gaspar, & Li, 2014; Pegna, Khateb, Michel, & Landis, 2004; Rossion et al., 2003), (b) word- and face-selective regions are immediately adjacent on the ventral surface of the cortex (Yeatman et al., 2013; Wandell et al., 2012; Dehaene et al., 2010), and (c) word- and face-selective regions have been hypothesized to share a common neuronal architecture (Dehaene et al., 2010; Dehaene & Cohen, 2007).

Differences in temporal tuning profiles reflect differences in the nature of the computations performed by word- and face-selective cortex. Despite the close spatial proximity of these regions, our measurements suggest that there must be substantial differences in either the neuronal architecture of, or the hierarchy of regions that feed signals into, word- and face-selective cortex. We find that temporal acuity for faces is substantially higher than for text—the amplitude of the face-selective response at 4–6 Hz is several times higher than the text-selective response. Hence, regions that process faces are more sensitive to rapidly changing stimuli than regions that process text. This observation predicts that perceptual decisions will show markedly different time courses for words and faces.

Previous work has found that the differential SSVEP response to changing identity faces versus constant identity faces is maximal at 6 Hz (Alonso-Prieto et al., 2013). One interpretation of this peak frequency is that it is due to the linear superposition of transient ERPs with a latency of 150–170 msec. However, it is important to note that the latency of ERP is influenced by two factors: (1) integration time or the amount of time required for a brain region to process the incoming information and reach a maximal response and (2) conduction delay or the amount of time required for the signal to reach this brain region. Hence, the similar ERP latency for words and faces does not by itself indicate that temporal processing is equivalent in word- and face-selective cortex.

Here, we find the best temporal frequencies for driving cortical responses are substantially lower for text (1 Hz) than for face (4 Hz) images. A direct tying of these peak frequencies to transient response latencies via the superposition model would predict latencies of 1000 msec for transient ERPs to words and 250 msec for face responses. These predicted latencies are clearly inconsistent with the common 150–170 msec ERP latency for both stimulus categories (Cao et al., 2014; Pegna et al., 2004; Rossion et al., 2003). This finding shows that, under a different set of measurement conditions, the temporal aspects of the signal in word- and face-selective cortex can be substantially different despite previous reports noting similarities between the ERP waveform.

Finally, in addition to the mixture of fixed conduction delays and integration delays inherent in visual processing, the visual system is also manifestly nonlinear and the conditions under which SSVEP measurements are made—temporally dense stimuli—are very different from the temporally sparse conditions used to measure ERP parameters. The presence of temporal nonlinearities, such as adaptation, also makes it difficult to make direct predictions in the absence of a full nonlinear model of the system response. Here we used the first harmonic of the evoked response as a proxy measure and found the phase–frequency relationship to be linear and thus were able to calculate and aggregate delay measure for the two stimulus classes we used.

This is the first EEG study to use the Portilla–Simoncelli algorithm (Portilla & Simoncelli, 2000) to create the baseline condition against which the object level response is compared. This algorithm preserves a set of higher-order, joint statistics that are lost when the phase of the power spectrum is scrambled. Our paradigm thus isolates responses (at the first harmonic) to text and face images that are higher-order than those driven by the power spectrum of the image. They are also higher-order than responses driven by the joint statistics encoded by the Portilla and Simoncelli algorithm. Previous work in macaque (Rust & Dicarlo, 2010) has found that responses in inferior temporal cortex differ between intact and scrambled versions of the same image to a greater degree than do the responses in V4 when the Portilla–Simoncelli algorithm is used. A recent report using fMRI in humans (Movshon & Simoncelli, 2014; Freeman, Ziemba, Simoncelli, & Movshon, 2013) has contrasted responses to Portilla–Simoncelli scrambled textures and intact natural textures and found differential responses occurred only at and beyond area V4. Our approach may thus make the resulting SSVEP more selective to the intrinsic structure of orthography and faces than other approaches such as phase scrambling.

By mapping the temporal delay over the electrode array, we find evidence for multiple underlying sources on the basis of significantly different response delays. It is interesting to note that even these long latency sources continue to respond to steady-state stimulation. In the case of the text response, longer latency activity may reflect increasingly complex orthographic processing. In the case face-related activity, the long latency responses over right anterior temporal cortex may arise in the “extended” face network (Haxby, Hoffman, & Gobbini, 2000) that includes anterior inferotemporal cortex (Kriegeskorte, Formisano, Sorger, & Goebel, 2007). Consistent with this interpretation, intracranial recordings with similar stimuli have found SSVEP responses to face images in anterior inferior temporal cortex (Liu-Shuang, Jonas, et al., 2015). Previous transient ERP studies have found a negativity around 250 msec for face stimuli (Schweinberger, Huddy, & Burton, 2004; Schweinberger, Pickering, Jentzsch, Burton, & Kaufmann, 2002) and for objects such as birds or cars after expertise training (Scott, Tanaka, Sheinberg, & Curran, 2006, 2008). These responses are sensitive to repetition and familiarity effects that are not seen in the N170 response. Our approach may be tapping a similar process, as both faces and text are highly overlearned stimuli in typical adults.

Conclusions

SSVEPs represent a promising approach for characterizing the temporal dynamics of high-level visual regions that are selective for text, faces, and other important visual categories. Temporal tuning curves can be reliably estimated from relatively short stimulation paradigms, opening the possibility of studying changes in neural dynamics over the course of development (e.g., learning to read) and in the case of developmental disorders (e.g., dyslexia and prosopagnosia). Our measurements clearly demonstrate that the temporal dynamics of word- versus face-selective cortex differ substantially, laying the foundation for models that relate temporal processing to perception and behavior.

Acknowledgments

The authors wish to acknowledge the contributions of Faraz Farzin to the conduct of the EEG recordings and early conceptual design of the study. She was supported by a Ruth L. Kirschstein National Research Service Award (F32EY021389).

Reprint requests should be sent to Jason D. Yeatman, Institute for Learning & Brain Sciences (I-LABS), Department of Speech and Hearing Sciences, University of Washington, 1715 Columbia Road N, Portage Bay Building, Seattle, WA 98115, or via e-mail: jyeatman@uw.edu.

REFERENCES

Alonso-Prieto
,
E.
,
Belle
,
G. V.
,
Liu-Shuang
,
J.
,
Norcia
,
A. M.
, &
Rossion
,
B.
(
2013
).
The 6 Hz fundamental stimulation frequency rate for individual face discrimination in the right occipito-temporal cortex
.
Neuropsychologia
,
51
,
2863
2875
.
Barlow
,
H. B.
,
Blakemore
,
C.
, &
Pettigrew
,
J. D.
(
1967
).
The neural mechanism of binocular depth discrimination
.
Journal of Physiology
,
193
,
327
342
.
Battelli
,
L.
,
Cavanagh
,
P.
,
Martini
,
P.
, &
Barton
,
J. J.
(
2003
).
Bilateral deficits of transient visual attention in right parietal patients
.
Brain
,
126
,
2164
2174
.
Bechoefer
,
J.
(
2011
).
Kramers-Kronig, Bode, and the meaning of zero
.
American Journal of Physics
,
79
,
1053
1059
.
Bentin
,
S.
,
Allison
,
T.
,
Puce
,
A.
,
Perez
,
E.
, &
McCarthy
,
G.
(
1996
).
Electrophysiological studies of face perception in humans
.
Journal of Cognitive Neuroscience
,
8
,
551
565
.
Bentin
,
S.
,
Mouchetant-Rostaing
,
Y.
,
Giard
,
M. H.
,
Echallier
,
J. F.
, &
Pernier
,
J.
(
1999
).
ERP manifestations of processing printed words at different psycholinguistic levels: Time course and scalp distribution
.
Journal of Cognitive Neuroscience
,
11
,
235
260
.
Cao
,
X.
,
Jiang
,
B.
,
Gaspar
,
C.
, &
Li
,
C.
(
2014
).
The overlap of neural selectivity between faces and words: Evidences from the N170 adaptation effect
.
Experimental Brain Research
,
232
,
3015
3021
.
Chung
,
S. T.
,
Mansfield
,
J. S.
, &
Legge
,
G. E.
(
1998
).
Psychophysics of reading. XVIII. The effect of print size on reading speed in normal peripheral vision
.
Vision Research
,
38
,
2949
2962
.
Cohen
,
L.
,
Lehericy
,
S.
,
Chochon
,
F.
,
Lemer
,
C.
,
Rivaud
,
S.
, &
Dehaene
,
S.
(
2002
).
Language-specific tuning of visual cortex? Functional properties of the visual word form area
.
Brain
,
125
,
1054
1069
.
Coltheart
,
M.
,
Rastle
,
K.
,
Perry
,
C.
,
Langdon
,
R.
, &
Ziegler
,
J.
(
2001
).
DRC: A dual route cascaded model of visual word recognition and reading aloud
.
Psychological Review
,
108
,
204
256
.
De Valois
,
R. L.
,
Albrecht
,
D. G.
, &
Thorell
,
L. G.
(
1982
).
Spatial frequency selectivity of cells in macaque visual cortex
.
Vision Research
,
22
,
545
559
.
Dehaene
,
S.
, &
Cohen
,
L.
(
2007
).
Cultural recycling of cortical maps
.
Neuron
,
56
,
384
398
.
Dehaene
,
S.
,
Pegado
,
F.
,
Braga
,
L. W.
,
Ventura
,
P.
,
Nunes Filho
,
G.
,
Jobert
,
A.
, et al
(
2010
).
How learning to read changes the cortical networks for vision and language
.
Science
,
330
,
1359
1364
.
Dejerine
,
J.
(
1891
).
Su un ca de cecite verbale avec agraphie, suivi d'autopsie
.
Mémoires de la Société de Biologie
,
3
,
197
201
.
Freeman
,
J.
,
Ziemba
,
C.
,
Simoncelli
,
E. P.
, &
Movshon
,
J. A.
(
2013
).
Functionally partioning the ventral stream with controlled natural stimuli
.
Society for Neuroscience Abstracts
,
406.01
.
Gaillard
,
R.
,
Naccache
,
L.
,
Pinel
,
P.
,
Clemenceau
,
S.
,
Volle
,
E.
,
Hasboun
,
D.
, et al
(
2006
).
Direct intracranial, fMRI, and lesion evidence for the causal role of left inferotemporal cortex in reading
.
Neuron
,
50
,
191
204
.
Gauthier
,
B.
,
Eger
,
E.
,
Hesselmann
,
G.
,
Giraud
,
A. L.
, &
Kleinschmidt
,
A.
(
2012
).
Temporal tuning properties along the human ventral visual stream
.
Journal of Neuroscience
,
32
,
14433
14441
.
Grill-Spector
,
K.
, &
Weiner
,
K. S.
(
2014
).
The functional architecture of the ventral temporal cortex and its role in categorization
.
Nature Reviews Neuroscience
,
15
,
536
548
.
Hasson
,
U.
,
Levy
,
I.
,
Behrmann
,
M.
,
Hendler
,
T.
, &
Malach
,
R.
(
2002
).
Eccentricity bias as an organizing principle for human high-order object areas
.
Neuron
,
34
,
479
490
.
Hasson
,
U.
,
Yang
,
E.
,
Vallines
,
I.
,
Heeger
,
D. J.
, &
Rubin
,
N.
(
2008
).
A hierarchy of temporal receptive windows in human cortex
.
Journal of Neuroscience
,
28
,
2539
2550
.
Haxby
,
J. V.
,
Hoffman
,
E. A.
, &
Gobbini
,
M. I.
(
2000
).
The distributed human neural system for face perception
.
Trends in Cognitive Sciences
,
4
,
223
233
.
Holcombe
,
A. O.
(
2009
).
Seeing slow and seeing fast: Two limits on perception
.
Trends in Cognitive Sciences
,
13
,
216
221
.
Hubel
,
D. H.
, &
Wiesel
,
T. N.
(
1962
).
Receptive fields, binocular interaction and functional architecture in the cat's visual cortex
.
Journal of Physiology
,
160
,
106
154
.
Hubel
,
D. H.
, &
Wiesel
,
T. N.
(
1965
).
Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat
.
Journal of Neurophysiology
,
28
,
229
289
.
Huth
,
A. G.
,
Nishimoto
,
S.
,
Vu
,
A. T.
, &
Gallant
,
J. L.
(
2012
).
A continuous semantic space describes the representation of thousands of object and action categories across the human brain
.
Neuron
,
76
,
1210
1224
.
Jonas
,
J.
,
Descoins
,
M.
,
Koessler
,
L.
,
Colnat-Coulbois
,
S.
,
Sauvee
,
M.
,
Guye
,
M.
, et al
(
2012
).
Focal electrical intracerebral stimulation of a face-sensitive area causes transient prosopagnosia
.
Neuroscience
,
222
,
281
288
.
Kanwisher
,
N.
,
McDermott
,
J.
, &
Chun
,
M. M.
(
1997
).
The fusiform face area: A module in human extrastriate cortex specialized for face perception
.
Journal of Neuroscience
,
17
,
4302
4311
.
Kelly
,
D. H.
(
1961a
).
Flicker fusion and harmonic analysis
.
Journal of the Optical Society of America
,
51
,
917
918
.
Kelly
,
D. H.
(
1961b
).
Visual response to time-dependent stimuli. I. Amplitude sensitivity measurements
.
Journal of the Optical Society of America
,
51
,
422
429
.
Kriegeskorte
,
N.
,
Formisano
,
E.
,
Sorger
,
B.
, &
Goebel
,
R.
(
2007
).
Individual faces elicit distinct response patterns in human anterior temporal cortex
.
Proceedings of the National Academy of Sciences, U.S.A.
,
104
,
20600
20605
.
Kwong
,
K. K.
,
Belliveau
,
J. W.
,
Chesler
,
D. A.
,
Goldberg
,
I. E.
,
Weisskoff
,
R. M.
,
Poncelet
,
B. P.
, et al
(
1992
).
Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation
.
Proceedings of the National Academy of Sciences, U.S.A.
,
89
,
5675
5679
.
Levy
,
I.
,
Hasson
,
U.
,
Avidan
,
G.
,
Hendler
,
T.
, &
Malach
,
R.
(
2001
).
Center-periphery organization of human object areas
.
Nature Neuroscience
,
4
,
533
539
.
Liu-Shuang
,
J.
,
Ales
,
J. M.
,
Rossion
,
B.
, &
Norcia
,
A. M.
(
2015
).
The effect of contrast polarity reversal on face detection: Evidence of perceptual asymmetry from sweep VEP
.
Vision Research
,
108
,
8
19
.
Liu-Shuang
,
J.
,
Jonas
,
J.
,
Ales
,
J.
,
Norcia
,
A.
,
Maillard
,
L.
, &
Rossion
,
B.
(
2015
).
Relative sensitivity to low- versus high-level visual properties in face-sensitive regions of the human ventral occipito-temporal cortex: Evidence from intra-cerebral recording
.
Paper presented at the Vision Sciences Society
.
Lopes da Silva
,
F. H.
,
van Rotterdam
,
A.
,
Storm van Leeuwen
,
W.
, &
Tielen
,
A. M.
(
1970
).
Dynamic characteristics of visual evoked potentials in the dog. II. Beta frequency selectivity in evoked potentials and background activity
.
Electroencephalography and Clinical Neurophysiology
,
29
,
260
268
.
Maurer
,
U.
,
Brandeis
,
D.
, &
McCandliss
,
B. D.
(
2005
).
Fast, visual specialization for reading in English revealed by the topography of the N170 ERP response
.
Behavioral and Brain Functions
,
1
,
13
.
McKeeff
,
T. J.
,
Remus
,
D. A.
, &
Tong
,
F.
(
2007
).
Temporal limitations in object processing across the human ventral visual pathway
.
Journal of Neurophysiology
,
98
,
382
393
.
McMains
,
S. A.
, &
Somers
,
D. C.
(
2004
).
Multiple spotlights of attentional selection in human visual cortex
.
Neuron
,
42
,
677
686
.
Meyers
,
E. M.
,
Borzello
,
M.
,
Freiwald
,
W. A.
, &
Tsao
,
D.
(
2015
).
Intelligent information loss: The coding of facial identity, head pose, and non-face information in the macaque face patch system
.
Journal of Neuroscience
,
35
,
7069
7081
.
Movshon
,
J. A.
, &
Simoncelli
,
E. P.
(
2014
).
Representation of naturalistic image structure in the primate visual cortex
.
Cold Spring Harbor Symposia on Quantitative Biology
,
79
,
115
122
.
Movshon
,
J. A.
,
Thompson
,
I. D.
, &
Tolhurst
,
D. J.
(
1978
).
Spatial and temporal contrast sensitivity of neurones in areas 17 and 18 of the cat's visual cortex
.
Journal of Physiology
,
283
,
101
120
.
Mukamel
,
R.
,
Harel
,
M.
,
Hendler
,
T.
, &
Malach
,
R.
(
2004
).
Enhanced temporal non-linearities in human object-related occipito-temporal cortex
.
Cerebral Cortex
,
14
,
575
585
.
Norcia
,
A. M.
,
Appelbaum
,
L. G.
,
Ales
,
J. M.
,
Cottereau
,
B. R.
, &
Rossion
,
B.
(
2015
).
The steady-state visual evoked potential in vision research: A review
.
Journal of Vision
,
15
,
4
.
Ozus
,
B.
,
Liu
,
H. L.
,
Chen
,
L.
,
Iyer
,
M. B.
,
Fox
,
P. T.
, &
Gao
,
J. H.
(
2001
).
Rate dependence of human visual cortical response due to brief stimulation: An event-related fMRI study
.
Magnetic Resonance Imaging
,
19
,
21
25
.
Parvizi
,
J.
,
Jacques
,
C.
,
Foster
,
B. L.
,
Witthoft
,
N.
,
Rangarajan
,
V.
,
Weiner
,
K. S.
, et al
(
2012
).
Electrical stimulation of human fusiform face-selective regions distorts face perception
.
Journal of Neuroscience
,
32
,
14915
14920
.
Pegna
,
A. J.
,
Khateb
,
A.
,
Michel
,
C. M.
, &
Landis
,
T.
(
2004
).
Visual recognition of faces, objects, and words using degraded stimuli: Where and when it occurs
.
Human Brain Mapping
,
22
,
300
311
.
Portilla
,
J.
, &
Simoncelli
,
E. P.
(
2000
).
A parametric texture model basedon joint statistics of complex wavelet coefficients
.
International Journal of Computer Vision
,
40
,
49
71
.
Potter
,
M. C.
, &
Faulconer
,
B. A.
(
1975
).
Time to understand pictures and words
.
Nature
,
253
,
437
438
.
Rossion
,
B.
,
Joyce
,
C. A.
,
Cottrell
,
G. W.
, &
Tarr
,
M. J.
(
2003
).
Early lateralization and orientation tuning for face, word, and object processing in the visual cortex
.
Neuroimage
,
20
,
1609
1624
.
Rust
,
N. C.
, &
Dicarlo
,
J. J.
(
2010
).
Selectivity and tolerance (“invariance”) both increase as visual information propagates from cortical area V4 to IT
.
Journal of Neuroscience
,
30
,
12978
12995
.
Schweinberger
,
S. R.
,
Huddy
,
V.
, &
Burton
,
A. M.
(
2004
).
N250r: A face-selective brain response to stimulus repetitions
.
NeuroReport
,
15
,
1501
1505
.
Schweinberger
,
S. R.
,
Pickering
,
E. C.
,
Jentzsch
,
I.
,
Burton
,
A. M.
, &
Kaufmann
,
J. M.
(
2002
).
Event-related brain potential evidence for a response of inferior temporal cortex to familiar face repetitions
.
Brain Research, Cognitive Brain Research
,
14
,
398
409
.
Scott
,
L. S.
,
Tanaka
,
J. W.
,
Sheinberg
,
D. L.
, &
Curran
,
T.
(
2006
).
A reevaluation of the electrophysiological correlates of expert object processing
.
Journal of Cognitive Neuroscience
,
18
,
1453
1465
.
Scott
,
L. S.
,
Tanaka
,
J. W.
,
Sheinberg
,
D. L.
, &
Curran
,
T.
(
2008
).
The role of category learning in the acquisition and retention of perceptual expertise: A behavioral and neurophysiological study
.
Brain Research
,
1210
,
204
215
.
Seidenberg
,
M. S.
, &
McClelland
,
J. L.
(
1989
).
A distributed, developmental model of word recognition and naming
.
Psychological Review
,
96
,
523
568
.
Thomas
,
C. G.
, &
Menon
,
R. S.
(
1998
).
Amplitude response and stimulus presentation frequency response of human primary visual cortex using BOLD EPI at 4 T
.
Magnetic Resonance in Medicine
,
40
,
203
209
.
Wandell
,
B. A.
,
Rauschecker
,
A. M.
, &
Yeatman
,
J. D.
(
2012
).
Learning to see words
.
Annual Review of Psychology
,
63
,
31
53
.
Yeatman
,
J. D.
,
Rauschecker
,
A. M.
, &
Wandell
,
B. A.
(
2013
).
Anatomy of the visual word form area: Adjacent cortical circuits and long-range white matter connections
.
Brain and Language
,
125
,
146
155
.
Zhu
,
X. H.
,
Kim
,
S. G.
,
Andersen
,
P.
,
Ogawa
,
S.
,
Ugurbil
,
K.
, &
Chen
,
W.
(
1998
).
Simultaneous oxygenation and perfusion imaging study of functional activity in primary visual cortex at different visual stimulation frequency: Quantitative correlation between BOLD and CBF changes
.
Magnetic Resonance in Medicine
,
40
,
703
711
.