An Important Step toward Understanding the Role of Body-based Cues on Human Spatial Memory for Large-Scale Environments

Abstract Moving our body through space is fundamental to human navigation; however, technical and physical limitations have hindered our ability to study the role of these body-based cues experimentally. We recently designed an experiment using novel immersive virtual-reality technology, which allowed us to tightly control the availability of body-based cues to determine how these cues influence human spatial memory [Huffman, D. J., & Ekstrom, A. D. A modality-independent network underlies the retrieval of large-scale spatial environments in the human brain. Neuron, 104, 611–622, 2019]. Our analysis of behavior and fMRI data revealed a similar pattern of results across a range of body-based cues conditions, thus suggesting that participants likely relied primarily on vision to form and retrieve abstract, holistic representations of the large-scale environments in our experiment. We ended our paper by discussing a number of caveats and future directions for research on the role of body-based cues in human spatial memory. Here, we reiterate and expand on this discussion, and we use a commentary in this issue by A. Steel, C. E. Robertson, and J. S. Taube (Current promises and limitations of combined virtual reality and functional magnetic resonance imaging research in humans: A commentary on Huffman and Ekstrom (2019). Journal of Cognitive Neuroscience, 2020) as a helpful discussion point regarding some of the questions that we think will be the most interesting in the coming years. We highlight the exciting possibility of taking a more naturalistic approach to study the behavior, cognition, and neuroscience of navigation. Moreover, we share the hope that researchers who study navigation in humans and nonhuman animals will synergize to provide more rapid advancements in our understanding of cognition and the brain.


INTRODUCTION
One of the major issues in cognitive neuroscience involves relating neural signals to the types of behaviors we encounter during real-world situations, such as navigating to our local supermarket to find our favorite foods. Steel, Robertson, and Taube (2020) discuss one important issue when considering modeling real-world navigation in the laboratory: how we account for body movements in navigation experiments that use virtual reality (VR). In particular, they critiqued one of our recent papers (Huffman & Ekstrom, 2019a) in which we showed that the spatial representations of well-learned environments were similar across a range of body-based cue conditions. Thus, our results suggested that participants might rely more strongly on visual cues to form and retrieve memories of large-scale spatial environments. Steel et al. (2020) challenged the validity of our approach by arguing that (1) the use of immersive VR is unnatural, (2) our results are limited because we did not find a difference in behavioral performance during fMRI scanning, (3) our behavioral tasks did not adequately assess spatial representations, and (4) our theoretical aims involve a false dichotomy. Careful evaluation of each of their critiques, however, reveals that our findings are robust to their concerns. Moreover, we aim to clarify the rationale behind our experimental design and to highlight some of the key differences between our study (in humans) and studies on the neuroscience of navigation in rodents. In particular, we discuss several approaches that we think could enhance our understanding of how we represent largescale, ecologically relevant spatial environments under naturalistic behavioral demands. Thus, we will interleave our discussion of Steel et al.'s (2020) criticisms within the broader framework of experimental designs that seek to understand how humans form and retrieve memories of large-scale spatial environments, such as the towns in which we live.

WHAT DID WE FIND IN OUR PREVIOUS PAPER?
In this section, we will briefly review our experimental design as well as our results and discussion (Huffman & Ekstrom, 2019a). In our experiment, participants learned three virtual cities under three levels of body-based cues: (1) impoverished: participants stood on an omnidirectional treadmill and viewed the environment via a head-mounted display, but they controlled all of their navigation via a joystick; (2) limited: rotations and head movements were yoked to real-world movements via a head-mounted display, but they moved forward using a joystick; (3) enriched: rotations and head movements were yoked to real-world movements via a head-mounted display, and participants moved forward in the environment by walking on an omnidirectional treadmill. Participants learned three cities to criterion performance (based on their abstract, holistic knowledge of the spatial environment as assessed by performance of the judgments of relative direction [JRD] task) before undergoing fMRI scanning. During fMRI scanning, participants performed the JRD task as well as a perceptually matched active baseline task (a math task that looked visually similar to the JRD task and involved similar button presses) and a resting-state task.
We analyzed our data using a variety of approaches, including (1) a Bayesian analysis of task performance across multiple measures, (2) a classification analysis of putative network interactions, (3) a classification analysis of singletrial patterns of activity in ROIs (the hippocampus, parahippocampal cortex, and retrosplenial cortex), (4) an activation analysis, (5) a whole-brain Bayesian analysis, and (6) a pattern similarity analysis investigating distancerelated coding. Importantly, the results generated from all of these analyses revealed that behavioral performance and the patterns of brain activity were similar across the three body-based cue conditions. Moreover, we used a machine learning approach of "generalization testing" to assess whether patterns of activity generalized between different, but related, conditions (e.g., between different JRD task conditions). We would like to emphasize that such an approach is beneficial because finding evidence for the similarity of conditions (i.e., similar generalization performance between conditions) is stronger evidence that the brain is treating different conditions similarly than relying on Bayesian effects alone (e.g., Bayes null). Importantly, we found similar generalization performance (and strong correlations) between the different JRD tasks as a function of body-based cues during initial learning and our perceptually matched active baseline task and the resting state for (1) our network-based analysis and (2) singletrial classification analysis within our ROIs and the whole brain. Altogether, we concluded that participants likely relied primarily on vision when they retrieved information about large-scale spatial environments. We highlighted caveats and future directions, which we will reiterate and expand upon below.

VR TECHNOLOGY ALLOWS FOR TIGHT EXPERIMENTAL CONTROL WHILE ALSO OFFERING A HIGHLY IMMERSIVE EXPERIENCE
In this section, we will discuss Steel et al.'s (2020) first criticism: that the use of immersive VR with real-world head rotations and leg movements is completely unnatural. In our paper (Huffman & Ekstrom, 2019a), we used VR to tightly control the amount of exposure and the level of body-based cues with which a participant learned large-scale virtual environments to determine the influence of bodybased cues on human spatial memory during retrieval. Although we agree with Steel et al. (2020) that navigation on an omnidirectional treadmill is not a complete substitution for real-world body movements, the use of this technology provided critical scientific control for our experiment. For example, we were able to directly match the visual features and the immersive feeling of wearing the headmounted display across the three body-based cue conditions.
We tested the idea that body-based cues exist along a continuum, including designs that (1) do not include any body-based cues (e.g., verbal communication of spatial information), (2) include only optic flow (e.g., desktop-based navigation), (3) use a head-mounted display but use a joystick for all movements (e.g., our "impoverished" condition), (4) employ a head-mounted display and yoke head and body movements to real-world body movements (e.g., our "limited" condition), (5) employ a head-mounted display and an omnidirectional treadmill to simulate many of the relevant cues for walking (e.g., our "enriched" condition), and (6) employ real-world body translations and rotations. Thus, we assumed that participants could make use of any (and all) of the body-based cues that were available to them, perhaps most critically, including real-world body rotations (see the green plot in Figure 1A). In contrast, Steel et al. (2020) argued that our participants likely ignored body-based cues in our paradigm, thus advocating for a model in which body-based cues cannot be used by participants unless they involve conditions with strictly real-world locomotion (see the purple plot in Figure 1A). As we discuss below, there are several lines of evidence to suggest that our participants could have, in fact, used the body-based cues that were available to them in our limited and enriched conditions, contradicting arguments from Steel et al. (2020).
Specifically, we included a condition ("limited") in which participants stood on the treadmill but they actively moved their head and body to explore the virtual environment. In particular, this condition would activate the semicircular canals in the same manner as it would with real-world movement. For example, previous research has shown similar head-direction coding under conditions of passive rotation during constant location (e.g., Shinder & Taube, 2011), which would suggest that our active rotation conditions should have been sufficient to activate the headdirection system in humans. Moreover, Steel et al. (2020) cited and discussed other studies (e.g., Robertson, Hermann, Mynick, Kravitz, & Kanwisher, 2016;Shine, Valdés-Herrera, Hegarty, & Wolbers, 2016) that are equivalent to our "limited" condition that have also advanced our knowledge of spatial representations. They suggested that these designs could have allowed participants to reactivate body-based cues during retrieval when the rotations in these studies (with a head-mounted display) were identical to those employed in our study. Therefore, it is not clear why they argue that participants in our "limited" (and "enriched") condition could not have also used and reactivated these real-world rotational cues. Steel et al. (2020) also raised questions about how naturally omnidirectional treadmills replicate real-world walking in the "enriched" condition. Although we described our approach in the methods of our paper, we would like to emphasize that we used a detailed procedure in which we attempted to train participants to walk as naturally as possible. We first trained participants to walk on the omnidirectional treadmill without the head-mounted display. Once we determined that they were walking naturally, we then had them don the head-mounted display and perform several practice tasks to teach them to be able to comfortably navigate on the treadmill. Participants then returned another day for the main task session. Thus, before participants began to learn the environments for the fMRI experiment, they already had between 1.5 and 2.5 hr of experience walking on the treadmill. We also required participants to walk as naturally as possible throughout the experiment. In addition, we ensured that our interface between the treadmill and the VR software was set to approximate average human walking speed. Finally, we ensured that the treadmill responded accurately and dynamically to changes in the participant's walking speed (e.g., the faster a participant walked on the treadmill, the faster they moved in the virtual environment).
Previous research with animals as varied as desert ants (e.g., Dahmen, Wahl, Pfeffer, Mallot, & Wittlinger, 2017), rats (e.g., Aronov & Tank, 2014), and humans (e.g., Harootonian, Wilson, Hejtmánek, Ziskin, & Ekstrom, 2020) has shown that animals can use information from their body-based cues to navigate on omnidirectional treadmills. For example, previous research in the desert ant showed that they could accurately walk the distance and direction to a home location while walking on an omnidirectional treadmill (Dahmen et al., 2017). Although this is an extreme example because the desert ant is thought to rely heavily on body-based cues (e.g., step counting; Wehner, 2020;Wittlinger, Wehner, & Wolf, 2006), such findings suggest that if animals (e.g., humans, rats) make use of information from taking steps (e.g., proprioceptive feedback and motor-efference copies) and head rotations (e.g., vestibular information), then omnidirectional treadmills can provide access to at least some of these relevant cues. For example, previous research has shown that rats can accurately navigate to target locations in a virtual version of the Morris Water Maze using an omnidirectional treadmill (Aronov & Tank, 2014). Moreover, previous research has suggested that human participants can accurately perform a task that is thought to measure path integration while they walked on the same Figure 1. The influence of body-based cues might differ between the navigation interface, the nature of the behavioral task, and the spatial scale of the environment. (A) We tested a model that suggests that a participant's ability to use body-based cues exists along a continuum. Specifically, we suggested that participants can use any body-based cues that are available to them; thus, our "limited" condition (rotations: real-world body rotations; translation: joystick) and "enriched" condition (rotations: real-world body rotations; translation: omnidirectional treadmill) would give participants access to useful information about real-world body rotations and about taking steps on the treadmill. In contrast, Steel et al. (2020) argue that participants cannot use body-based cues unless they are physically moving their bodies through space (i.e., real-world navigation). Thus, in their model, participants use body-based cues neither from real-world body rotations in immersive VR nor from taking steps on an omnidirectional treadmill. (B) We suggest that the influence of body-based cues on human spatial cognition might differ based on the nature of the behavioral task or on the spatial scale of the environment. Specifically, body-based cues might exert a stronger influence on tasks that emphasize a navigator's ability to keep track of themselves relative to another object in the environment (i.e., more egocentric tasks) as opposed to tasks that encourage participants to form holistic, abstract representations of the environment (i.e., more allocentric tasks). In addition, because of the accumulation of error in the path integration system, body-based cues might exert a stronger effect in smaller-scale environments. HMD = head-mounted display.
omnidirectional treadmill that we used in our experiment (specifically, a triangle-completion task; Harootonian et al., 2020). Importantly, the Harootonian et al. (2020) study solely used body-based cues (i.e., no visual cues), thus providing direct support for the notion that participants can and do use information from body-based cues to navigate on omnidirectional treadmills.
More relevant for the discussion of the neuroscience of navigation, previous research revealed the full complement of spatially selective cells when rats walked on an omnidirectional treadmill that was very similar to our treadmill (Aronov & Tank, 2014). Specifically, this study reported similar place cells, head-direction cells, and grid cells between their VR condition and the real world. Moreover, they reported evidence of border cells and of remapping of place cells between similar environments. Although these cellular findings in rodents clearly cannot speak directly to our study in humans involving fMRI, these findings suggest that if similar mechanisms are at play in their apparatus in rats and in our apparatus in humans, then we might expect that our enriched condition on the treadmill would reveal similar spatial representations to real-world navigation in the human brain. Of course, such a prediction awaits future experimentation, for example, in patients with implanted electrodes (e.g., Topalovic et al., 2020;Aghajan et al., 2017;Bohbot, Copara, Gotman, & Ekstrom, 2017) Steel et al. (2020), we agree that such experiments will be fundamental to increasing our understanding of the role of body-based cues on human spatial memory. On the basis of the evidence reviewed above, however, we disagree with Steel et al. (2020) that participants could not have used any of the relevant body-based cues in our experiment. Therefore, we argue that our approach allowed us to study part of the larger continuum of body-based cues, although undoubtedly future experiments will be helpful in further testing these predictions (i.e., the comparison between our models in Figure 1A).
We also want to highlight one of the main benefits of using an omnidirectional treadmill in the laboratory in the first place: It allowed us to fit a city-sized environment into a small room. Importantly, as we will discuss later, these larger-scale environments are the kinds of spaces we are most interested in studying because they relate to our ability to navigate over ecologically and evolutionarily relevant dimensions. In addition, similar to more traditional laboratory-based tasks, VR allows tight experimental control over the degree of exposure to an environment, and it allows experimenters to gather detailed data regarding a participant's full exploration history within such environments. Moreover, VR allows researchers to create experiences that would be difficult or impossible to create in the real world (e.g., to study the strategies underlying human navigation; Warren, Rothman, Schnapp, & Ericson, 2017).

IT IS IMPERATIVE TO MATCH BEHAVIORAL PERFORMANCE BETWEEN CONDITIONS TO MAKE ANY CONCLUSIONS ABOUT THE ROLE OF BODY-BASED CUES ON THE NEURAL REPRESENTATION OF SPATIAL INFORMATION
In this section, we will address Steel et al.'s (2020) second criticism: that our results are limited because we did not find a difference in behavioral performance during fMRI scanning. In contrast to their view, we argue that it is imperative that researchers match behavioral performance when looking at the role of body-based cues on brain responses to spatially guided behavior. In fact, this is one area that we want to highlight as a seeming misunderstanding of our overall approach. Specifically, if we had sent participants into the scanner with differing levels of spatial memory performance, then if we had observed differences in their brain as a function of how they originally learned the environment, we could not have deconfounded whether these brain differences were caused by a difference of the quality of spatial memory retrieval (i.e., a memory effect) versus the effect of body-based cues per se (i.e., the mode of locomotion through the environment). That is, such an effect could be caused by an artifact of a failure to retrieve spatial information. For example, if you were asked to perform spatial memory tasks for an environment that you have never visited, then, of course, your brain would show little to no retrieval or spatial-like coding. If we want to know, however, if the role of body-based cues mattered per se, then we could study, for example, if different networks supported the retrieval of an environment that you had walked versus an environment that you had navigated via other mechanisms. Thus, we designed our experiment using this logic of purposely matching performance between the body-based cue conditions before beginning fMRI data collection. Therefore, to ensure that our point is perfectly clear, the fact that participants had equal behavioral performance during the fMRI session was an intended consequence of our training paradigm and was in no way an artifact.
We would like to further highlight the importance of matching behavioral performance between conditions in studies in the rodent. For example, many studies of spatial coding in the rodent brain are done in the complete absence of any overt behavioral demands. Specifically, rodents are often participating in a random foraging experiment, where they are walking around without any specific task. Moreover, in some studies of the role of body-based cues, rodents are then put into stressful situations (e.g., being wrapped in a towel [i.e., a makeshift straightjacket] and then are passively transported through the environment; e.g., Foster, Castro, & McNaughton, 1989 Thus, under such conditions, it is difficult to know anything about the animal's mental processes. Is there any behavioral benefit for them to keep track of their location in the environment? We argue that it is paramount for researchers to have animals perform the same behavioral task between body-based cue conditions so that confounds such as behavioral performance, strategy use, and attention to spatial information can be mitigated. In fact, previous research in humans has suggested that behavioral demands can shape performance (and likely the underlying brain representations) to become more modality independent (e.g., between visual and proprioceptive conditions: Experiment 3 of Avraamides, Loomis, Klatzky, & Golledge, 2004). Therefore, if brain differences are observed under conditions of matched behavioral performance, then it would provide more compelling evidence that body-based cues (i.e., mode of locomotion) significantly contribute to spatial coding (e.g., "Where am I?").
More generally, we and others have highlighted the importance of disentangling the active/passive task performance versus the role of body-based cues on spatial representations (e.g., Huffman & Ekstrom, 2019a;Chrastil & Warren, 2012, 2015. Briefly, many rodent studies of the role of body-based cues have also excluded active navigation strategies. That is, active versus passive differences confound the possible contributions of decision-making with the role of body-based cues. Thus, in our experiment, participants in all three conditions were performing the same active navigation task and had the same behavioral demands to form stable, abstract spatial representations of the environment. Therefore, we agree with Chrastil and Warren (2012) that studies investigating the role of bodybased cues should seek to deconfound the role of these cues from the role of active decision-making.

THE IMPORTANCE OF THE TYPE OF BEHAVIORAL TASK
In this section, we will address Steel al.'s (2020) third criticism: that our behavioral tasks did not adequately assess spatial representations. The JRD task is one of the most well-characterized and widely used tasks in the human spatial navigation literature because it provides access to holistic representations of space (Vass & Epstein, 2017;Waller & Hodgson, 2006;Mou, McNamara, Valiquette, & Rump, 2004;McNamara, Rump, & Werner, 2003;Shelton & McNamara, 1997, 2001Rieser, 1989). We recently completed a detailed experimental paper that supported the construct validity of the JRD task (Huffman & Ekstrom, 2019b). For example, our results suggested that participants recruit similar underlying representations to solve the JRD task and a map drawing task, which is perhaps the strongest example of a task that taps into holistic spatial knowledge. Another advantage of the JRD task in our particular case is that it is easily employed in the scanner (compared to map drawing) and has no particular advantage that is evident for visual versus vestibular cues: All pointing is done based on imagined heading, and the only cues are text (reading) cues. A final advantage is that it allowed us to match behavioral performance, as we discussed in the previous section. Therefore, we believe that the JRD task was an appropriate and valid choice for spatial retrieval and one that provided perhaps the most insight into what we think of when we think of a "spatial representation" or "cognitive map": one that, like a map, is referenced to other landmarks and provides insight into participants' abstract, holistic knowledge of the environment. We also want to clarify that we also examined participants' behavior during navigation and we reported similar patterns of changes in excess path length between conditions, thus providing a measure of spatial memory performance during navigation.
On the basis of our finding of the similarity both of behavioral performance and of our neuroimaging analyses, we suggested (similar to others in the field) that tasks that require more of a holistic, abstract representation of the environment might place less demands on body-based cues (e.g., the JRD task and map drawing tasks; Huffman & Ekstrom, 2019a;Waller & Greenauer, 2007;Waller, Loomis, & Haun, 2004). On the other hand, body-based cues might play a stronger role in the performance of tasks that require the navigator to keep track of themselves relative to a salient landmark (or landmarks) in the environment (cf. Waller, Loomis, & Steck, 2003). These tasks would place greater emphasis on path integration and egocentric-based navigation (e.g., Ruddle, Volkova, Mohler, & Bülthoff, 2011;Ruddle & Lessels, 2006;Waller et al., 2004;Chance, Gaunet, Beall, & Loomis, 1998;Klatzky, Loomis, Beall, Chance, & Golledge, 1998). In fact, previous behavioral research has provided evidence of a double dissociation between performance on tasks that encourage participants to form an abstract, holistic representation of the environment (e.g., the JRD task) and tasks in which participants are asked to point to landmarks in the environment from their current location and orientation (e.g., Waller & Hodgson, 2006). Therefore, as we discussed in our previous paper, it will be interesting for future studies to test the role of body-based cues on the performance of these different types of tasks ( Figure 1B). For example, Waller et al. (2004, p. 162) argued that "the effect of body-based information on developing complex configural knowledge of spatial layout (as opposed to knowledge of self-to-object relations) may be minimal" (also see Waller & Greenauer, 2007). To elaborate on this idea, we submitted the F statistics from the results of their map drawing tasks to a Bayes factor analysis (Faulkenberry, 2019), and we found that BF 01 = 16.7 (F(2, 69) = 1.43; Waller et al., 2004, p. 161) and BF 01 > 30 (F(2, 81) < 1; Waller & Greenauer, 2007, p. 329), indicating that the observed data are approximately 16 and more than 30 times more likely under the null hypothesis than the alternative hypothesis for these two studies, respectively. Therefore, we agree with Steel et al. (2020) that it will be important for future studies to determine the conditions under which bodybased cues contribute to human spatial memory, and we think that tasks that vary along this continuum will be an important topic for such studies. Our study nonetheless provides an important boundary condition for understanding when body-based cues might not play a fundamental role, that is, for the retrieval of abstract, holistic spatial knowledge.

THE MODALITY-DEPENDENT HYPOTHESIS VS. THE MODALITY-INDEPENDENT HYPOTHESIS
In this section, we will address Steel et al.'s (2020) fourth criticism, which argued that our distinction between the modality-independent hypothesis and the modalitydependent hypothesis was based on a false dichotomy. In our paper, we outlined two theoretical constructs, both of which have support within the field of spatial navigation. The first is the modality-independent hypothesis, which argues that human spatial representations do not depend on the manner in which they were encoded and, at least in some instances, distill to the same modality-independent spatial representation regardless of how they were encoded (Giudice, Betty, & Loomis, 2011;Wolbers, Klatzky, Loomis, Wutte, & Giudice, 2011;Avraamides et al., 2004;Loomis, Lippa, Klatzky, & Golledge, 2002;Bryant, 1997;Taylor & Tversky, 1992). The second is what we termed the modality-dependent hypothesis, which argues that the encoding modality will ultimately affect the manner in which participants encode and retrieve spatial representations (Taube, Valerio, & Yoder, 2013). As stated in Taube et al. (2013), navigating in desktop-based virtual environments involves "conditions [in which] idiothetic cues and the path integration system would not be activated," which they go onto say: "Only activate[s] a portion of the neural network that is engaged during more naturalistic conditions that involve active movement." This is consistent with similar arguments about the cognitive map pointing out the fundamental importance of path integration to how rodents represent space (Moser & Moser, 2008;McNaughton, Battaglia, Jensen, Moser, & Moser, 2006;O'Keefe & Nadel, 1978).
We appreciate Steel et al.'s (2020) consideration about the viability of these competing hypotheses; however, the question about whether we proposed a false dichotomy comes down to a number of important questions about the proposed nature of spatial representations. First, do body-based cues play a fundamental role in the representation of spatial information? That is, without such cues, representations (and behavior) will not be stable, no matter how much experience an animal has with an environment. Alternatively, do body-based cues contribute to spatial representations by augmenting the representations or enhancing the rate of spatial learning? In this case, we would expect that animals might take longer to learn an environment in the absence of body-based cues; however, once the environment is well learned, we might expect the representations (e.g., the "cognitive map") to look similar. Second, which sensory modalities are most relevant to the formation of spatial representations, and how do these differ between species? For example, it is possible that vision plays a more predominant role in human cognition (e.g., Ekstrom, 2015;Posner, Nissen, & Klein, 1976) and that rodents rely on body-based cues (and other cues such as olfaction and sensory input from their whiskers) to a greater extent. Third, do humans and other animals have holistic, abstract, and centralized spatial representations? Previous research has suggested that this certainly need not be true. For example, a neural network model of navigation in insects (e.g., the desert ant, honeybees) suggests that these animals do not have an integrated, coherent "cognitive map" but instead have separate dedicated systems for different modalities (e.g., path integration vs. landmark-based memory; Cruse & Wehner, 2011; also see Collett, Chittka, & Collett, 2013).
In addition to the evidence reviewed above, other VR paradigms provide further support for the notion that vision might be the predominant cue that humans use during navigation. Briefly, a VR technique called redirected walking is used to allow participants to visually navigate in large-scale virtual environments while they physically walk within smaller-scale spaces (for a review, see Nilsson et al., 2018). These techniques vary in how they are implemented, but the key idea is that small, subthreshold visual rotations are induced and these cause participants to turn their bodies as they walk; that is, this causes them to slowly turn as they walk although they are often not aware of these rotations. Interestingly, Hodgson, Bachmann, and Waller (2008) found that participants performed similarly on the JRD task for environments that they learned under natural walking conditions and under redirected walking (i.e., when visual cues and body-based cues are in competition with each other). To elaborate on this idea, we submitted the F statistic from Hodgson et al. (2008, p. 19; withinparticipant manipulation, two conditions with 49 participants: F(1, 47) = 0.18, p = .68) to a Bayes factor analysis (Faulkenberry, 2019), and we found that BF 01 = 6.32, indicating that the observed data are approximately six times more likely under the null hypothesis than the alternative hypothesis. This provides further evidence that holistic, abstract spatial representations are not strongly affected by body-based cues.
On the basis of the evidence above, we argue that we did not propose a false dichotomy between the modalitydependent hypothesis and the modality-independent hypothesis (Huffman & Ekstrom, 2019a). We agree that future research could elucidate whether different species tend to exhibit more evidence for one of these hypotheses, with our recent work providing some support for the modality-independent hypothesis for large-scale spatial memory in humans (Huffman & Ekstrom, 2019a). We do not think, however, that there is enough compelling evidence to outright reject the modality-dependent hypothesis in all cases; therefore, we disagree that we proposed a false dichotomy. Instead, we suggest that our results provide an important boundary condition for when body-based cues might not be expected to contribute to human spatial memory.

THE IMPORTANCE OF SCALES OF SPACE
In addition to points addressed above, we would like to reiterate from our previous discussion (Huffman & Ekstrom, 2019a) that we think that it is important to consider the scale of the environments that are used in the study of the role of body-based cues. For example, it is possible that body-based cues might be more important in smaller environments. Previous research has suggested that staying oriented within environments solely using body-based cues is typically highly unreliable and rapidly diminishes because of error accumulation in the path integration system even over relatively short distances (e.g., Kim, Sapiurka, Clark, & Squire, 2013; for similar arguments, see Eichenbaum & Cohen, 2014). For example, blindfolded participants have been shown to walk in circles within a relatively short diameter (e.g., 20 m), thus suggesting that navigating large-scale spaces with body-based cues alone is insufficient for accurate wayfinding (Souman, Frissen, Sreenivasa, & Ernst, 2009). Therefore, we argue that it will be important for future studies to investigate whether there is a difference in the relative contribution of body-based cues at different scales of space (see Figure 1B). Specifically, we predict that body-based cues play a stronger role in smaller scale environments (see also Warren et al., 2017;Chrastil & Warren, 2014;Foo, Warren, Duchon, & Tarr, 2005;Loomis et al., 1993). This distinction is important, especially because most neuroscience studies in rodents have taken place in small-scale "vista" spaces in which the entire environment is immediately visible to the navigator (e.g., a 1 m × 1 m open arena during a random foraging task).
The issue of scales of space is also directly relevant to the study of spatial coding in the rodent brain. Rats explore large-scale spaces in the wild, including large underground tunnels. For example, rats have been shown to explore very long distances from their home environment (e.g., Russell et al., 2010;Calhoun, 1963). Recent computational modeling research suggested that the path integration codes afforded by grid cells would become severely disrupted over large-scale, multicompartment environments, such as rodents would encounter in their natural habitat (Stella, Urdapilleta, Luo, & Treves, 2020). Thus, path integration codes from grid cells could only enable accurate path integration over short distances. Although these predictions remain to be tested in electrophysiological studies, these findings again point to the importance of considering the scale and complexity of the environment. Thus, we agree with the conclusion that findings from small-scale, vista spaces with regularly shaped, flat environments might not scale to real-world environments with naturalistic behavioral demands (Stella et al., 2020).

WHAT IS THE METRIC OF THE "COGNITIVE MAP"?
We would also like to push back against the notion raised by Steel et al. (2020) that the concept of the cognitive map is universally accepted. In fact, behavioral experiments have led to debates about the nature of spatial representations in humans (e.g., Chrastil & Warren, 2014;Tversky, , 1993McNamara, 1991;Moar & Bower, 1983; for a comprehensive review, see Warren, 2019) and nonhuman animals (e.g., Cheung et al., 2014;Collett et al., 2013;Cruse & Wehner, 2011;Benhamou, 1996;Bennett, 1996). For example, Warren et al. (2017) found that human spatial navigation performance is better accounted for by a labeled graph than by Euclidean knowledge. Specifically, when the very metric of the space was disrupted (via wormholes that translated and rotated participants through the environment), participants failed to notice these inconsistencies in the environment and readily adapted these into their spatial knowledge, thus suggesting that they formed non-Euclidean representations of the environment. Importantly, these experiments were conducted when participants had full access to body-based cues while wearing a head-mounted display. Moreover, previous research and theoretical views have suggested that humans frequently rely on heuristics rather than metric Euclidean representations of spatial environments (e.g., the cognitive collage: Tversky, , 1993labeled graphs: Warren, 2019;Warren et al., 2017;Chrastil & Warren, 2014).
Much of what we know about the neuroscience of navigation has come from electrophysiological investigations of the rodent brain (e.g., Hafting, Fyhn, Molden, Moser, & Moser, 2005;Taube, Muller, & Ranck, 1990;O'Keefe & Nadel, 1978;O'Keefe & Dostrovsky, 1971). These studies have revealed an abundance of potential underlying mechanisms supporting navigation. For example, place cells, head-direction cells, grid cells, and border cells (among others) could potentially contribute to the structure of spatial knowledge, that is, an underlying metric that allows animals to navigate (e.g., Moser & Moser, 2008). However, this leads to an important question: How do we reconcile the seemingly disparate views of heuristic-based behavior and the seemingly metric-like representations in the brain (Figure 2)? In brief, we think the best approach forward will be to place a strong emphasis on trying to understand how the brain and behavior give rise to latent cognitive states. Importantly, such investigations should rely on the method of converging operations, in which we seek to find similar results between multiple conditions and approaches (e.g., McNamara, 1991).
Recent theoretical views have argued that cognitive neuroscientists often focus on the neuroscience at the expense of understanding the cognition or the behavior of the organism (Poeppel & Adolfi, 2020;Krakauer, Ghazanfar, Gomez-Marin, MacIver, & Poeppel, 2017). Many studies on the neuroscience of navigation have focused on investigating nonhuman animals under relatively constrained conditions Figure 2. It is important that future studies of spatial cognition consider the relationship between behavior, the brain, and latent cognitive states. Many neuroscience studies have focused on understanding relationships between the brain and the surrounding environment of the animal (e.g., place cells, head-direction cells, grid cells, border cells). Conversely, many studies have used behavioral measures to try to understand the latent underlying cognitive representations supporting human spatial memory. These studies have provided evidence that spatial knowledge is typically non-Euclidean. In contrast, spatial cells in the rodent brain appear to be more Euclidean in nature. Thus, it will be important for future research to determine how these seemingly disparate findings fit together. We argue that a more comprehensive understanding of spatial cognition can be obtained by considering the relationships between these various measures. (Note: fMRI figure from Huffman & Ekstrom, 2019a.) with little to no measures of behavior or performance. Such "reduced preparations" could lead to a fundamental gap in our understanding of the processes that we really care about with respect to high-level cognitive tasks-that is, a better understanding of the mental processes of the animal. For example, Krakauer et al. (2017) argued that we should first conduct detailed analyses of behavior before seeking to understand the underlying neural implementation. They discuss the important difference between Marr's (2010) algorithmic level (i.e., the computations supporting behavior; the software) and the implementational level (i.e., the neural processes supporting behavior; the hardware), and they argue that the algorithmic level can best be understood by designing clever behavioral experiments that are part of the natural repertoire of the organism in question. We argue that we should take a similar approach to understanding the connection between neural responses, latent cognitive representations, and spatially guided behavior, such as navigation. We further suggest that these three levels exist in a dynamic interplay in which each level can affect the other levels (see Figure 2). Because of length restrictions, we will refer the interested reader to other papers in which we have discussed these issues in more detail (Ekstrom, Harootonian, & Huffman, 2020;Ekstrom, Huffman, & Starrett, 2017; for a more general discussion, see Poeppel & Adolfi, 2020;Krakauer et al., 2017).

OPEN QUESTIONS
We would also like to raise several remaining questions: 1. Does the role of body-based cues differ as a function of behavioral task? For example, do body-based cues play a stronger role for the performance of tasks that emphasize self-to-object representations versus tasks that emphasize a holistic, abstract spatial representation (see Figure 1B; cf. Waller & Greenauer, 2007;Waller et al., 2004)? 2. How do behavioral demands alter spatial representations in humans and rodents? Does the role of bodybased cues in the rodent still differ under conditions in which behavioral performance is matched (e.g., allowing one to rule out confounds of behavioral performance or spatial attention)? 3. Does the role of body-based cues differ at different scales of space (see Figure 1B)? Specifically, do body-based cues play a stronger role on memory for small-scale spatial environments (i.e., because of error accumulation in the path integration system over longer distances)? Relatedly, how do spatial representations differ across spatial scales? 4. What is the relationship between patterns of brain activity (e.g., place cells, head-direction cells, grid cells, and border cells), behavioral expression, and latent cognitive states (see Figure 2)? 5. The study of human navigation has revealed that there are substantial individual differences in navigation ability more generally (e.g., Hegarty et al., 2006;; for similar findings in bats, see Harten et al., 2020). Are there similar individual differences in the use of body-based cues (e.g., professional 3-D video game players vs. orienteers)? 6. What is the role of active decision-making versus the role of body-based cues per se (cf. Chrastil & Warren, 2012)? As we discussed, many rodent studies conflate these two variables; thus, a better understanding of the role of these processes can be obtained by separating the task demands from the mode of locomotion. 7. We agree that it is unlikely that fMRI technology will advance in the near future to allow the acquisition of data while participants actively navigate. Thus, future research can focus on studying the role of body-based cues by using existing mobile technology such as mobile EEG (e.g., Djebbara et al., 2019;Jungnickel et al., 2019;Park & Donaldson, 2019;Park et al., 2018), functional near-infrared spectroscopy, or intracranial recordings in human patients (e.g., Topalovic et al., 2020;Aghajan et al., 2017;Bohbot et al., 2017).
In the future, mobile PET and MEG (Boto et al., 2018) might provide solutions. What kinds of codes can we obtain with such methods? Will they allow us to understand the brain areas involved in spatial cognition? One potentially exciting approach could be to first find evidence of neural differences using mobile EEG and to then test those participants using fMRI (e.g., to determine the brain regions that are involved in such differences). 8. What is the relationship between laboratory-based experiments in the rodent (e.g., random foraging within a 1 m × 1 m open arena) and ecologically valid, largescale navigation under naturalistic behavioral demands (cf. Wehner, 2020, Chapter 7;Jacobs & Menzel, 2014)?

Conclusion
In conclusion, we have many points of agreement with Steel et al. (2020). In fact, we made many similar arguments in our original paper (Huffman & Ekstrom, 2019a). We also raised several points of disagreement here. Therefore, we clarified points of potential misunderstanding, and we aimed to provide a constructive discussion of how future research with humans and nonhuman animals can answer interesting questions about the neuroscience of spatial cognition. In addition, because humans and rodents are different species, in addition to replicating findings between species, we should also design tasks that tap into the specific skills and cues that different species use to navigate. Accordingly, we think that the path forward is a wide set of behavioral and neural assays under varying levels of body-based cues and naturalistic designs. With such designs and experiments, we can better delineate the boundary conditions under which vision versus body-based cues are fundamental to how we navigate.