We report results from two experiments in which subjects had to categorize briefly presented upright or inverted natural scenes. In the first experiment, subjects decided whether images contained animals or human faces presented at different scales. Behavioral results showed virtually identical processing speed between the two categories and very limited effects of inversion. One type of event-related potential (ERP) comparison, potentially capturing low-level physical differences, showed large effects with onsets at about 150 msec in the animal task. However, in the human face task, those differences started as early as 100 msec. In the second experiment, subjects responded to close-up views of animal faces or human faces in an attempt to limit physical differences between image sets. This manipulation almost completely eliminated small differences before 100 msec in both tasks. But again, despite very similar behavioral performances and short reaction times in both tasks, human faces were associated with earlier ERP differences compared with animal faces. Finally, in both experiments, as an alternative way to determine processing speed, we compared the ERP with the same images when seen as targets and nontargets in different tasks. Surprisingly, all task-dependent ERP differences had relatively long latencies. We conclude that task-dependent ERP differences fail to capture object processing speed, at least for some categories like faces. We discuss models of object processing that might explain our results, as well as alternative approaches.