Sensitivity to temporal change places fundamental limits on object processing in the visual system. An emerging consensus from the behavioral and neuroimaging literature suggests that temporal resolution differs substantially for stimuli of different complexity and for brain areas at different levels of the cortical hierarchy. Here, we used steady-state visually evoked potentials to directly measure three fundamental parameters that characterize the underlying neural response to text and face images: temporal resolution, peak temporal frequency, and response latency. We presented full-screen images of text or a human face, alternated with a scrambled image, at temporal frequencies between 1 and 12 Hz. These images elicited a robust response at the first harmonic that showed differential tuning, scalp topography, and delay for the text and face images. Face-selective responses were maximal at 4 Hz, but text-selective responses, by contrast, were maximal at 1 Hz. The topography of the text image response was strongly left-lateralized at higher stimulation rates, whereas the response to the face image was slightly right-lateralized but nearly bilateral at all frequencies. Both text and face images elicited steady-state activity at more than one apparent latency; we observed early (141–160 msec) and late (>250 msec) text- and face-selective responses. These differences in temporal tuning profiles are likely to reflect differences in the nature of the computations performed by word- and face-selective cortex. Despite the close proximity of word- and face-selective regions on the cortical surface, our measurements demonstrate substantial differences in the temporal dynamics of word- versus face-selective responses.