Real-world navigation requires movement of the body through space, producing a continuous stream of visual and self-motion signals, including proprioceptive, vestibular, and motor efference cues. These multimodal cues are integrated to form a spatial cognitive map, an abstract, amodal representation of the environment. How the brain combines these disparate inputs and the relative importance of these inputs to cognitive map formation and recall are key unresolved questions in cognitive neuroscience. Recent advances in virtual reality technology allow participants to experience body-based cues when virtually navigating, and thus it is now possible to consider these issues in new detail. Here, we discuss a recent publication that addresses some of these issues (D. J. Huffman and A. D. Ekstrom. A modality-independent network underlies the retrieval of large-scale spatial environments in the human brain. Neuron, 104, 611–622, 2019). In doing so, we also review recent progress in the study of human spatial cognition and raise several questions that might be addressed in future studies.