Object recognition requires dynamic transformations of low-level visual inputs to complex semantic representations. Although this process depends on the ventral visual pathway, we lack an incremental account from low-level inputs to semantic representations and the mechanistic details of these dynamics. Here we combine computational models of vision with semantics and test the output of the incremental model against patterns of neural oscillations recorded with magnetoencephalography in humans. Representational similarity analysis showed visual information was represented in low-frequency activity throughout the ventral visual pathway, and semantic information was represented in theta activity. Furthermore, directed connectivity showed visual information travels through feedforward connections, whereas visual information is transformed into semantic representations through feedforward and feedback activity, centered on the anterior temporal lobe. Our research highlights that the complex transformations between visual and semantic information is driven by feedforward and recurrent dynamics resulting in object-specific semantics.