During conversation, speakers monitor their own and others' output so they can alter their production adaptively, including halting it if needed. We investigated the neural mechanisms of monitoring and halting in spoken word production by employing a modified stop signal task during fMRI. Healthy participants named target pictures and withheld their naming response when presented with infrequent auditory words as stop signals. We also investigated whether the speech comprehension system monitors inner (i.e., prearticulatory) speech via the output of phonological word form encoding as proposed by the perceptual loop theory [Levelt, W. J. M. Speaking: From intention to articulation. Cambridge, MA: MIT Press, 1989] by presenting stop signals phonologically similar to the target picture name (e.g., cabbage–CAMEL). The contrast of successful halting versus naming revealed extensive BOLD signal responses in bilateral inferior frontal gyrus, preSMA, and superior temporal gyrus. Successful versus unsuccessful halting of speech was associated with increased BOLD signal bilaterally in the posterior middle temporal, frontal, and parietal lobes and decreases bilaterally in the posterior and left anterior superior temporal gyrus and right inferior frontal gyrus. These results show, for the first time, the neural mechanisms engaged during both monitoring and interrupting speech production. However, we failed to observe any differential effects of phonological similarity in either the behavioral or neural data, indicating monitoring of inner versus external speech might involve different mechanisms.