Although people do not normally try to remember associations between faces and physical contexts, these associations are established automatically, as indicated by the difficulty of recognizing familiar faces in different contexts (“butcher-on-the-bus” phenomenon). The present fMRI study investigated the automatic binding of faces and scenes. In the face–face (F–F) condition, faces were presented alone during both encoding and retrieval, whereas in the face/scene–face (FS–F) condition, they were presented overlaid on scenes during encoding but alone during retrieval (context change). Although participants were instructed to focus only on the faces during both encoding and retrieval, recognition performance was worse in the FS–F than in the F–F condition (“context shift decrement” [CSD]), confirming automatic face–scene binding during encoding. This binding was mediated by the hippocampus as indicated by greater subsequent memory effects (remembered > forgotten) in this region for the FS–F than the F–F condition. Scene memory was mediated by right parahippocampal cortex, which was reactivated during successful retrieval when the faces were associated with a scene during encoding (FS–F condition). Analyses using the CSD as a regressor yielded a clear hemispheric asymmetry in medial temporal lobe activity during encoding: Left hippocampal and parahippocampal activity was associated with a smaller CSD, indicating more flexible memory representations immune to context changes, whereas right hippocampal/rhinal activity was associated with a larger CSD, indicating less flexible representations sensitive to context change. Taken together, the results clarify the neural mechanisms of context effects on face recognition.