While listening to meaningful speech, auditory input is processed more rapidly near the end (vs. beginning) of sentences. Although several studies have shown such word-to-word changes in auditory input processing, it is still unclear from which processing level these word-to-word dynamics originate. We investigated whether predictions derived from sentential context can result in auditory word-processing dynamics during sentence tracking. We presented healthy human participants with auditory stimuli consisting of word sequences, arranged into either predictable (coherent sentences) or less predictable (unstructured, random word sequences) 42-Hz amplitude-modulated speech, and a continuous 25-Hz amplitude-modulated distractor tone. We recorded RTs and frequency-tagged neuroelectric responses (auditory steady-state responses) to individual words at multiple temporal positions within the sentences, and quantified sentential context effects at each position while controlling for individual word characteristics (i.e., phonetics, frequency, and familiarity). We found that sentential context increasingly facilitates auditory word processing as evidenced by accelerated RTs and increased auditory steady-state responses to later-occurring words within sentences. These purely top–down contextually driven auditory word-processing dynamics occurred only when listeners focused their attention on the speech and did not transfer to the auditory processing of the concurrent distractor tone. These findings indicate that auditory word-processing dynamics during sentence tracking can originate from sentential predictions. The predictions depend on the listeners' attention to the speech, and affect only the processing of the parsed speech, not that of concurrently presented auditory streams.