Visual search is controlled by representations of target objects (attentional templates). Such templates are often activated in response to verbal descriptions of search targets, but it is unclear whether search can be guided effectively by such verbal cues. We measured ERPs to track the activation of attentional templates for new target objects defined by word cues. On each trial run, a word cue was followed by three search displays that contained the cued target object among three distractors. Targets were detected more slowly in the first display of each trial run, and the N2pc component (an ERP marker of attentional target selection) was attenuated and delayed for the first relative to the two successive presentations of a particular target object, demonstrating limitations in the ability of word cues to activate effective attentional templates. N2pc components to target objects in the first display were strongly affected by differences in object imageability (i.e., the ability of word cues to activate a target-matching visual representation). These differences were no longer present for the second presentation of the same target objects, indicating that a single perceptual encounter is sufficient to activate a precise attentional template. Our results demonstrate the superiority of visual over verbal target specifications in the control of visual search, highlight the fact that verbal descriptions are more effective for some objects than others, and suggest that the attentional templates that guide search for particular real-world target objects are analog visual representations.