Human speech is composed of two types of information, related to content (lexical information, i.e., “what” is being said [e.g., words]) and to the speaker (indexical information, i.e., “who” is talking [e.g., voices]). The extent to which lexical versus indexical information is represented separately or integrally in the brain is unresolved. In the current experiment, we use short-term fMRI adaptation to address this issue. Participants performed a loudness judgment task during which single or multiple sets of words/pseudowords were repeated with single (repeat) or multiple talkers (speaker-change) conditions while BOLD responses were collected. As reflected by adaptation fMRI, the left posterior middle temporal gyrus, a crucial component of the ventral auditory stream performing sound-to-meaning computations (“what” pathway), showed sensitivity to lexical as well as indexical information. Previous studies have suggested that speaker information is abstracted during this stage of auditory word processing. Here, we demonstrate that indexical information is strongly coupled with word information. These findings are consistent with a plethora of behavioral results that have demonstrated that changes to speaker-related information can influence lexical processing.