Processing of complex visual stimuli comprising facial movements, hand actions, and body movements is known to occur in the superior temporal sulcus (STS) of humans and nonhuman primates. The STS is also thought to play a role in the integration of multimodal sensory input. We investigated whether STS neurons coding the sight of actions also integrated the sound of those actions. For 23% of neurons responsive to the sight of an action, the sound of that action significantly modulated the visual response. The sound of the action increased or decreased the visually evoked response for an equal number of neurons. In the neurons whose visual response was increased by the addition of sound (but not those neurons whose responses were decreased), the audiovisual integration was dependent upon the sound of the action matching the sight of the action. These results suggest that neurons in the STS form multisensory representations of observed actions.