Social–emotional cues, such as affective vocalizations and emotional faces, automatically elicit emotional action tendencies. Adaptive social–emotional behavior depends on the ability to control these automatic action tendencies. It remains unknown whether neural control over automatic action tendencies is supramodal or relies on parallel modality-specific neural circuits. Here, we address this largely unexplored issue in humans. We consider neural circuits supporting emotional action control in response to affective vocalizations, using an approach–avoidance task known to reliably index control over emotional action tendencies elicited by emotional faces. We isolate supramodal neural contributions to emotional action control through a conjunction analysis of control-related neural activity evoked by auditory and visual affective stimuli, the latter from a previously published data set obtained in an independent sample. We show that the anterior pFC (aPFC) supports control of automatic action tendencies in a supramodal manner, that is, triggered by either emotional faces or affective vocalizations. When affective vocalizations are heard and emotional control is required, the aPFC supports control through negative functional connectivity with the posterior insula. When emotional faces are seen and emotional control is required, control relies on the same aPFC territory downregulating the amygdala. The findings provide evidence for a novel mechanism of emotional action control with a hybrid hierarchical architecture, relying on a supramodal node (aPFC) implementing an abstract goal by modulating modality-specific nodes (posterior insula, amygdala) involved in signaling motivational significance of either affective vocalizations or faces.