In social interactions, it is often necessary to rapidly encode the association between visually presented faces and auditorily presented names. The present study used event-related potentials to examine the neural correlates of associative encoding for multimodal face–name pairs. We assessed study-phase processes leading to high-confidence recognition of correct pairs (and consistent rejection of recombined foils) as compared to lower-confidence recognition of correct pairs (with inconsistent rejection of recombined foils) and recognition failures (misses). Both high- and low-confidence retrieval of face–name pairs were associated with study-phase activity suggestive of item-specific processing of the face (posterior inferior temporal negativity) and name (fronto-central negativity). However, only those pairs later retrieved with high confidence recruited a sustained centro-parietal positivity that an ancillary localizer task suggested may index an association-unique process. Additionally, we examined how these processes were influenced by massed repetition, a mnemonic strategy commonly employed in everyday situations to improve face–name memory. Differences in subsequent memory effects across repetitions suggested that associative encoding was strongest at the initial presentation, and thus, that the initial presentation has the greatest impact on memory formation. Yet, exploratory analyses suggested that the third presentation may have benefited later memory by providing an opportunity for extended processing of the name. Thus, although encoding of the initial presentation was critical for establishing a strong association, the extent to which processing was sustained across subsequent immediate (massed) presentations may provide additional encoding support that serves to differentiate face–name pairs from similar (recombined) pairs by providing additional encoding opportunities for the less dominant stimulus dimension (i.e., name).