Abstract
Modern driver-assist and monitoring systems are severely limited by the lack of a precise understanding of how humans localize and predict the position of neighboring road users. Virtual Reality (VR) is a cost-efficient means to investigate these matters. However, human perception works differently in reality and in immersive virtual environments, with visible differences even between different VR environments. Therefore, when exploring human perception, the relevant perceptive parameters should first be characterized in the specific VR environment. In this paper, we report the results of two experiments that were designed to assess localization and prediction accuracy of static and moving visual targets in a VR setup developed using broadly available hardware and software solutions. Results of the first experiment provide a reference measure of the significant effect that distance and eccentricity have on localization error for static visual targets, while the second experiment shows the effect of time variables and contextual information on the localization accuracy of moving targets. These results provide a solid basis to test in VR the effects of different ergonomics and driver-vehicle interaction designs on perception accuracy.
1 Introduction
Advanced Driver Assist Systems (ADAS) support and assist drivers in critical situations, and are thought to be one of the main drivers in the decreasing trend of road accidents and fatalities that has been observed over the last two decades (Bengler et al., 2014; European Road Safety Observatory, 2021). Immersive Virtual Reality (VR) simulations are commonly adopted by research centers and automotive companies in the early stages of testing and validation of ever more sophisticated ADAS. Performing experiments in immersive VR offers indisputable advantages, including higher repeatability, lower costs of modifying the experimental conditions, greater control over study variables, and the opportunity of testing extreme scenarios without the ensuing risks (Slater & Sanchez-Vives, 2016). Despite great improvements in terms of realism and immersion, however, immersive VR simulations are still affected by severe limitations in the way people perceive the virtual environment compared to the real world (Buck, Young, & Bodenheimer, 2018; Hayhoe et al., 2002).
The information collected through visual perception is believed to be used by the brain to create an internal representation of the surrounding physical space, called the visual space. Localization accuracy through visual perception depends both on human ability to perceive and therefore place a target in the visual space, and on the relation between visual and real space, and is known to depend mainly on two spatial factors: target's distance and target's eccentricity, intended as its distance from the focal point of the field of view. As we see next, eccentricity and distance play distinct roles, both in determining accuracy and in characterizing how perception differs in the real environment and in immersive VR.
Perception accuracy is not uniform across the retina. The fovea, which corresponds to the central area, covering approximately of visual angle (Wolfe, Dobres, Rosenholtz, & Reimer, 2017), offers higher visual resolution due to its greater cortical representation, while peripheral vision contributes to creating a fast but lower-detailed representation of the environment (Trouilloud et al., 2020). The influence of eccentricity on accuracy has been extensively studied in the automotive and driving context. The eccentricity of a visual stimulus is known to play a major role in visual and attentional processes (Adam, Davelaar, Van Der Gouw, & Willems, 2008; Carrasco, Evert, Chang, & Katz, 1995; Carrasco, McElreel, Denisova, & Giordano, 2003; Edwards & Goolkasian, 1974; Staugaard, Petersen, & Vangkilde, 2016). The Functional or Useful Field of View (UFOV) (Ball, Beard, Roenker, Miller, & Griggs, 1988; Sanders, 1970) is defined as the total visual field area in which useful information can be acquired within one eye fixation (i.e., without eye or head movements). Its assessment represents an established measure of the capabilities of a driver's peripheral vision. The size of the UFOV varies significantly across individuals, situations, and life span, with a deterioration that begins early in life, starting at the age of 20 (Sekuler, Bennett, & Mamelak, 2000). UFOV size proved to be a good predictor of traffic crash risk (Ball, Owsley, Sloane, Roenker, & Bruni, 1993; Clay et al., 2005), while experienced drivers were found to have a wider field of view with a more accurate peripheral vision than less experienced drivers (Crundall, Underwood, & Chapman, 1999, 2002). Nevertheless, it is also important to not oversimplify this concept by always considering as perceived what is inside the UFOV and as not perceived what is outside (Wolfe et al., 2017). For instance, the availability of additional information at large eccentricities was found to correlate with safer driving behavior in hazardous situations (Alberti, Shahar, & Crundall, 2014). Moreover, information gathered through the peripheral vision is useful for the correct execution of several aspects of the driving process: lane keeping performances have been proved to be affected by the eccentricity of a secondary visual task (Summala, Nieminen, & Punto, 1996), while the detection of salient stimuli (i.e., a braking leading vehicle) located at different peripheral locations was found to be significantly affected by the eccentricity of the stimulus with respect to the center of the driver's field of view (Lamble, Laakso, & Summala, 1999). More recently, different cognitive load and eccentricity conditions were considered while assessing the perception accuracy of visual stimuli (i.e., brake lights). Results showed that participants' responses were slower and less accurate when the eccentricity of stimuli increased, and that this effect was stronger than the one linked to different cognitive loads (Wolfe, Sawyer, Kosovicheva, Reimer, & Rosenholtz, 2019). Similarly, in Svärd, Bärgman, and Victor (2021), secondary tasks were performed at different eccentricities during an unexpected lead vehicle event. Results showed that even though the driver's glance response time was not affected by eccentricity, the brake response time increased with increasing task eccentricity. Eccentricity therefore affects the accuracy with which a target is placed in the visual space, and consequently in the real space.
For what concerns the different impacts of eccentricity on perception in immersive VR and in reality, while typical Head-Mounted Displays (HMDs) restrict the span of the field of view below the maximum human capability, they do not significantly affect the UFOV, which according to reference tests (Ball et al., 1988) usually spans little more than to either side of the focal point. Immersive VR therefore does not appear to significantly change the effect of eccentricity on perception accuracy.
The same does not hold for distance. Different findings seem to suggest that visual space cannot be considered a well-structured geometrical entity, but is instead expanded in the horizontal direction (i.e., what we defined as eccentricity) and thus anisotropic (Cuijpers, Kappers, & Koenderink, 2000; Doumen, Kappers, & Koenderink, 2006). The anisotropy of the visual space has been observed both in real environments (Doumen et al., 2006; Kelly, Loomis, & Beall, 2004) and in immersive VR environments, showing discrepancies between the perceived distance of objects located at different eccentricities (Peillard et al., 2019). In real scenarios, humans are able to perceive the distance to an object with great accuracy at distances up to 20 meters (Wu, Ooi, & He, 2004). In immersive VR environments, instead, individuals tend to underestimate egocentric distances, that is, distances from oneself to a target (Cutting & Vishton, 1995). The factors causing this distance compression, which in experimental setups typically regarded targets between 1.5 and 30 m from the subject (Buck et al., 2018; Renner, Velichkovsky, & Helmert, 2013), are however far from having been clearly identified. Some works attribute distance compression to the working and general characteristics of HMDs and therefore relate it to the specifications of different HMD models (Buck et al., 2018). Other works instead attribute distance compression uniquely to the reduced span of the available field of view when using an HMD (Masnadi, Pfeil, Sera-Josef, & LaViola, 2022), and yet other works reject the existence of any effect of field of view span on distance compression (Knapp & Loomis, 2004). Lacking a better understanding of the relation between perception of the real environment and perception in immersive VR with different HMD models, tests that are run in immersive VR need to be based on well-calibrated, HMD-model-specific characterization of the main perception parameters.
All the above analyses concern the accuracy of localization of a static target. The localization and prediction of the trajectory of moving targets add an extra layer of complexity, being also affected by the availability of internal models of the physical characteristics of the target and of the visual environment in which the target moves (Bosco et al., 2015). Investigation of vision in the natural world has revealed that the pattern and duration of the fixations are highly specialized for each situation (Hayhoe et al., 2002). Predicting the future position of a moving object is a common task that most people perform on a daily basis. Activities such as crossing roads, avoiding other pedestrians, and playing sports (Flavell et al., 2018) involve making decisions based on our perception of when a moving object will reach a specific point. When investigating the processes that allow humans to predict the future position of a moving object, researchers typically focused on the time-to-collision, in which a moving target is visible for some defined time interval and then disappears, and participants are asked to estimate when the target would have reached or passed a specific point, had it continued on its initial path. Previous studies indicated that the time interval during which a target is not visible significantly affects performance, so that prediction accuracy is lowest when the prediction time is longest (i.e., the temporal horizon in which the prediction is made) (Lyon & Waag, 1995; Peterken, Brown, & Bowman, 1991). Several studies considered also effects due to contextual information available from the environment and information acquired not only through the visual perception, but also using auditory cues (Chang & Jazayeri, 2018; DeLucia & Liddell, 1998; Qin et al., 2021), or at different levels of visual degradation (Hecht, Brendel, Wessels, & Bernhard, 2021). Nevertheless, most of the research based on this paradigm focused on the accuracy related to the time estimates in which a visual moving target would reach a specific point. Very little is known about the accuracy with which a subject can predict the position of a moving object, providing spatial estimates based exclusively on visual information acquired at the beginning of the movement, in the absence of explicit visuospatial references relating to the possible destination. This is particularly relevant when testing road driving scenarios, where the ability of correctly predicting the trajectories of nearby objects heavily affects driving behavior and safety.
The work that we present in this paper aims at characterizing the accuracy with which people both localize static visual targets and predict the motion of moving targets, in a very common and broadly available immersive VR setup, using a Meta Quest 2 HMD and two environments created using the Unity game engine. First, we considered static targets and we focused on visuospatial features known to affect visual perception in real-world scenarios, such as targets' egocentric distance and horizontal eccentricity. Second, we focused on temporal features, and we investigated how the accuracy in predicting the future position of a moving visual target is affected by the amount of time that the objects are seen, as well as other factors, such as the time horizon in which the movement is performed and the effect of contextual information available from the environment.
The results of these experiments provide a reference measure of the human visual perception and prediction accuracy in an immersive VR environment, therefore representing a baseline on which to construct easily reproducible immersive VR testing scenarios for driving applications.
2 Methods
In this section, we present the setup and the procedure of the two experiments we conducted. In the first experiment, we analysed the role played by visuospatial variables on human perception and localization accuracy in an immersive VR environment. In the second experiment, we focused on the effect that temporal variables have on people's accuracy in predicting the position of a moving target in an immersive VR environment.
2.1 Participants
The study involved a total of 51 participants (28 men, 23 women) aged between 20 and 34 (M 25.6; SD 2.8) who voluntarily took part in the experiment. The vast majority were right-handed (90.2%), and all had normal or corrected to normal vision. Possessing a driving license and being used to right-hand traffic (RHT) (i.e., driving on the right side of the road) were the two main requirements for participating in the study. The majority of the participants had their driving license for more than 5 years (68.6%), 25% for more than 1 year, and only 3 participants obtained it less than 12 months before joining the study.
Each participant provided written informed consent before taking part in the study. Written consent and all methods were carried out in accordance with the principles of the Helsinki Declaration. All data was collected anonymously. The study was approved by the Politecnico di Milano Ethical Committee.
2.2 Procedure
Before starting the experiment, participants were asked to read and sign a consent form. Then, they were instructed about the structure of the study. After this, participants were asked to fill in a questionnaire in which they were asked to rate their experience with immersive VR technology, the number of years since they obtained their driving license, and their hand dominance. Once the questionnaire was completed, participants were asked to wear a HMD, and they were guided through a short training of the first experimental task. When they felt confident with the procedure and the task, they were allowed to proceed with the real experiment. This entire first phase lasted about 10 minutes. Once Experiment 1 was completed, participants were free to remove the HMD and to take a break while the experimenter started to introduce the second part of the study. Once participants were ready, they proceeded with a short training of the second experimental task to familiarize themselves with it. In this case, they were allowed to proceed with the actual experiment once they proved and confirmed they understood the task. This second part lasted about 15 minutes. Both training phases were characterized by the same dynamics of the corresponding experimental tasks but with different target positions and maneuvers, to avoid possible learning effects that may affect the experiments. During every phase of the study, the HMD view presented to the participants was replicated on the laptop monitor to which the HMD was connected. Two experimenters monitored the execution of all the different tasks that constituted the presented study.
2.3 Experiment 1
Experiment 1 followed a within-subjects design where the dependent variable was constituted by the error related to participants' performance in the experimental task. Targets' eccentricity and distance represented the two within-subjects factors. The and axes were arranged as in Figure 1, centered at the participant location. Study variables and accuracy measures were structured as follows:
- Eccentricity: target's lateral position with respect to the center of participants' field of view. For an object located at coordinates , it is mathematically defined as . Targets could be located at an eccentricity of 0, , . Eccentricities of and defined targets lying in the right or left hemispaces, though for simplicity of the subsequent analysis results regarding eccentricities of same magnitude and opposite sign were analyzed together. Therefore, this variable could assume three levels: eccentricity 0, eccentricity 15, eccentricity 30. The eccentricity values used in the current study were chosen to effectively explore differences in perception accuracy between the center of the field of view, the peripheral region of the UFOV (i.e., 30) (Ball et al., 1988), and a midpoint between these two values (i.e., 15).Figure 1.
Distance: target distance from the participants' position, measured in Unity units. This variable could assume three levels: Near 5 units, Mid 10 units, Far 20 units. The distance was measured along the axis in our VR environment, with zero at the participant's position. To ensure a correspondence between the real world and the VR environment, the scenario was built assuming that 1 Unity unit corresponds approximately to 1 meter in the real world.
axis error (): the distance between the actual target position and the position indicated by participants, measured along the axis.
axis error (): the distance between the actual target position and the position indicated by participants, measured along the axis.
Accuracy error (): it is defined as .
Participants were asked to focus on a red fixation point located in front of them at the center of the scenario (i.e., 0 eccentricity). To be sure that participants' gaze remained focused on the fixation point during the target presentation, they were asked to perform a simple task, consisting in aiming at the fixation point using the laser ray that originated from the index finger of their hand's virtual avatar. When hit by the participants' laser ray the fixation point slightly changed color, allowing to easily check the execution of the task. The entire task was monitored by two experimenters who were trained to discard the trial in case participants failed to aim at the fixation point. While they looked at the fixation point, a target object (i.e., a pedestrian) appeared at different positions, according to the levels of the two variables eccentricity and distance. In order to avoid learning effects, the spawning position presented slightly random variations between trials included in a circle that had its center on the exact coordinates corresponding to the eccentricity and distance values considered and a radius that covered of participants' field of view, measured on the axis. The spawning area for each condition is represented by the colored cylinders in Figure 1. The target object was visible for 0.5 second and then disappeared. After another 0.5-second interval, the fixation point turned green, and a sound cue was provided through the HMD speakers. At this point, participants were allowed to freely move their gaze and indicate where the target was located in the scenario.
In previous works, distances in real world and in immersive VR were usually measured using three main methods: verbal estimates, perceptual matching, and visually directed actions. In the first method, the participant is asked to verbally estimate the distance, for example, using a reference distance unit. In perceptual matching, participants are instructed to match the distance or the size of a target object in comparison to a reference object. Finally, in visually directed actions the participant sees the distance to a target object and then performs some kind of action toward the target object while blindfolded, like walking or imagined walking, or throwing a sandbag to indicate the distance perceived (Lin & Woldegiorgis, 2015; Renner et al., 2013; Sahm, Creem-Regehr, Thompson, & Willemsen, 2005). All of these methods present biases and limitations in accuracy. Furthermore, the methods considered in these works aim to highlight possible differences in the perceived distances between the real world and the immersive VR environment. Since our work was focused on identifying the accuracy of the perception of a visual target placed at a certain distance within an immersive VR environment, and not on comparing and measuring this distance in the real world, we decided to use a method more suited to VR, asking participants to relocate the target using a 3D pointer controlled by the movement of the participant's dominant hand. The 3D pointer (i.e., a cylinder) was located at the end of the laser ray originating from the index of their VR avatar hand (see the bottom-left corner of Figure 1). Once the pointer was in what they perceived as the target position, they confirmed their answer by pressing the controller trigger with the index. The spatial coordinates of this selection were recorded alongside the actual target position.
To be sure that the pointing method implemented was efficient and that the responses were free from biases due to involuntary movements or hand shaking, we implemented a short test and analyzed its result. Eleven participants were asked to use our pointing method to point at a 3D visual target (i.e., a white cylinder). The target was located in front of them at a distance that changed in each trial between the same three levels used in the Experiment.1: Near 5 units, Mid 10 units, Far 20 units. Since the goal was to check the accuracy of the pointing method, no time restrictions were applied and the target was visible for the entire duration of each trial, which ended when participants confirmed their response by pressing the controller trigger. We analyzed a total of 99 trials by comparing the actual position of each visual target with the response provided by participants using our pointing method (i.e., the coordinates of their 3D pointer when they pressed the trigger). Results showed that for all three distances considered, our method provided accurate pointing results with an average accuracy error between 0.1 and 0.45 units (Near: M = 0.14, SD = 0.06, Min = 0.012, Max = 0.28; Mid: M = 0.18, SD = 0.1, Min = 0.007, Max = 0.5; Far: M = 0.45, SD = 0.23, Min = 0.018, Max = 0.96).
2.4 Experiment 2
In this session, the task consisted in observing and matching a target object movements with the controlled pointer. Participants used the same pointing method used in the first experimental task. The only difference was that the 3D pointer was always visible and that participants did not have to press the trigger to confirm their prediction. The target object's movements were structured in two steps: (i) an initial maneuver following a straight line at a constant speed for 3 seconds, followed by (ii) a Maneuver of Interest (MOI) constituted by a left/right turn. The objective of the participants' task was to provide a prediction of the target position after different time intervals from the beginning of the MOI. Each MOI had two possible execution times, at the end of which participants were required to make their prediction. These two levels constituted the independent variable defined as Time of Prediction (). While performing the MOI, the target object remained visible for different time intervals, according to the five levels of our independent variable Time of Visibility (). When the target object disappeared, participants were asked to keep moving the pointer in what they considered as the predicted target maneuver until they heard a sound cue. At that moment, the spatial coordinates of the 3D pointer position were automatically recorded, together with the actual target position.
The experiment followed a within-subjects design where the dependent variable consisted of the accuracy error () related to participants' predictions during the experimental task. The three independent variables corresponding to the three within-subjects factors were:
Time of Visibility (): the time interval during which the target was visible, measured from the start of the MOI. This variable could assume 5 levels corresponding to: 0.25 s, 0.5 s, 0.75 s, 1 s, and 1.25 s. The time intervals were selected according to the information collected in a pilot test conducted with 5 participants during the study design. The obtained results allowed us to identify 0.25 s as the minimum time interval that would guarantee that the MOI was perceived by the participants. The same time span was maintained as the difference between the five levels constituting the variable.
Time of Prediction (): the total duration of the MOI, coinciding with the time horizon in which to perform the prediction. The variable could assume 2 levels: (2 s ), and (3 s ). The MOIs carried out by the targets were turning maneuvers performed according to realistic models and in well-defined urban contexts. A maximum prediction horizon of 3 s was deemed reasonable for the full execution of a typical turning maneuver in the considered urban context.
- Maneuver of Interest (MOI): the maneuver performed by each target after a 3-s constant straight movement. All maneuvers were performed at a constant velocity that was tuned to look realistic depending on the target (bicycle or pedestrian). MOIs were: (I) bicycle left turn in the correct lane, (II) bicycle right turn in the wrong lane, (III) bicycle left turn in the wrong lane, (IV) bicycle right turn in the correct lane, (V) pedestrian turn in the absence of crosswalks, (VI) pedestrian turn towards crosswalks (see Figure 2).Figure 2.
2.5 Experimental Setup
Both experimental tasks were performed in immersive VR scenarios developed with the Unity engine (version 2020.3.23). Participants were using Meta Quest 2 headset (resolution 1832 1920 per eye; refresh rate 72 Hz; horizontal field of view of approximately 90) and a Meta Touch controller in their dominant hand. Throughout all the tasks, participants remained in a standing pose. We set 1 unit in the immersive VR environment to correspond to 1 meter in real life, and we calibrated the participant viewpoint's distance from the floor inside the immersive VR scenarios at the beginning of each task to correspond to the actual height of the eyes of the participant. All the elements that appeared in the environment (road width and road markings, bicycle and cyclist, pedestrian) were sized according to the same scale, to match real-life proportions and increase the ecological validity of the scenario.
2.6 Data Analysis
We performed two normality tests (i.e., Kolmogorov-Smirnov and Shapiro-Wilk), and the results showed that the data related to both experiments were approximately normally distributed. We therefore proceeded for data related to Experiment 1 with a two-way repeated measures ANOVA with eccentricity and distance as the two within-subjects factors and , , and () as measures. Simple main effects were explored applying a Bonferroni correction ( 0.05). For Experiment 2, we performed a three-way repeated measures ANOVA using the five levels of , the two levels of , and the six target maneuvers (MOIs) as within-subjects factors, while the accuracy error () was set as measure. Post hoc comparisons were performed applying Bonferroni correction ( 0.05).
3 Results
In this section, we report the results of the analysis conducted for both experiments. For Experiment 1, we report the differences identified in participants' performance related to , , and depending on targets' eccentricity and distance. Descriptive statistics are reported in Table 1. For Experiment 2, we present how participants' prediction accuracy varies according to the different and values, and depending on the MOI considered. Descriptive statistics are reported in Table 2.
. | . | . | . | . | |||
---|---|---|---|---|---|---|---|
Eccentricity . | Distance . | M . | SD . | M . | SD . | M . | SD . |
0 | Near | −0.033 | 0.117 | −0.834 | 0.526 | 0.922 | 0.456 |
0 | Mid | 0.018 | 0.1 | 0.111 | 0.793 | 0.917 | 0.497 |
0 | Far | 0.011 | 0.125 | −0.215 | 1.905 | 1.72 | 1.166 |
15 | Near | 0.01 | 0.115 | −0.662 | 0.448 | 0.799 | 0.372 |
15 | Mid | 0.003 | 0.173 | 0.262 | 0.868 | 1.099 | 0.458 |
15 | Far | −0.297 | 0.379 | −0.992 | 1.455 | 2.156 | 1.036 |
30 | Near | 0.031 | 0.224 | −0.322 | 0.440 | 0.756 | 0.232 |
30 | Mid | −0.111 | 0.442 | 0.488 | 0.822 | 1.596 | 0.57 |
30 | Far | 0.353 | 0.775 | −1.813 | 2.01 | 3.112 | 1.595 |
. | . | . | . | . | |||
---|---|---|---|---|---|---|---|
Eccentricity . | Distance . | M . | SD . | M . | SD . | M . | SD . |
0 | Near | −0.033 | 0.117 | −0.834 | 0.526 | 0.922 | 0.456 |
0 | Mid | 0.018 | 0.1 | 0.111 | 0.793 | 0.917 | 0.497 |
0 | Far | 0.011 | 0.125 | −0.215 | 1.905 | 1.72 | 1.166 |
15 | Near | 0.01 | 0.115 | −0.662 | 0.448 | 0.799 | 0.372 |
15 | Mid | 0.003 | 0.173 | 0.262 | 0.868 | 1.099 | 0.458 |
15 | Far | −0.297 | 0.379 | −0.992 | 1.455 | 2.156 | 1.036 |
30 | Near | 0.031 | 0.224 | −0.322 | 0.440 | 0.756 | 0.232 |
30 | Mid | −0.111 | 0.442 | 0.488 | 0.822 | 1.596 | 0.57 |
30 | Far | 0.353 | 0.775 | −1.813 | 2.01 | 3.112 | 1.595 |
. | . | Overall . | 0.25 s . | 0.5 s . | 0.75 s . | 1 s . | 1.25 s . | 2 s . | 3 s . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | ||||||||||||||||
. | Traffic . | ||||||||||||||||
MOI . | law . | M . | SE . | M . | SE . | M . | SE . | M . | SE . | M . | SE . | M . | SE . | M . | SE . | M . | SE . |
I | 2.032 | 0.079 | 2.244 | 0.106 | 2.316 | 0.107 | 2.111 | 0.093 | 1.887 | 0.083 | 1.604 | 0.083 | 1.144 | 0.059 | 2.920 | 0.113 | |
II | 2.278 | 0.085 | 2.709 | 0.121 | 2.670 | 0.108 | 2.231 | 0.104 | 2.047 | 0.107 | 1.732 | 0.095 | 1.347 | 0.061 | 3.208 | 0.124 | |
III | 2.513 | 0.078 | 3.208 | 0.111 | 2.857 | 0.106 | 2.625 | 0.097 | 2.142 | 0.091 | 1.732 | 0.087 | 1.504 | 0.057 | 3.522 | 0.117 | |
IV | 2.307 | 0.072 | 2.975 | 0.115 | 2.576 | 0.107 | 2.299 | 0.090 | 1.987 | 0.085 | 1.698 | 0.097 | 1.434 | 0.045 | 3.180 | 0.119 | |
V | 2.123 | 0.076 | 3.513 | 0.146 | 2.652 | 0.129 | 1.575 | 0.079 | 1.467 | 0.071 | 1.407 | 0.080 | 1.824 | 0.079 | 2.422 | 0.089 | |
VI | 2.787 | 0.125 | 3.393 | 0.137 | 3.062 | 0.158 | 2.881 | 0.167 | 2.448 | 0.142 | 2.151 | 0.133 | 2.434 | 0.095 | 3.140 | 0.164 |
. | . | Overall . | 0.25 s . | 0.5 s . | 0.75 s . | 1 s . | 1.25 s . | 2 s . | 3 s . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | ||||||||||||||||
. | Traffic . | ||||||||||||||||
MOI . | law . | M . | SE . | M . | SE . | M . | SE . | M . | SE . | M . | SE . | M . | SE . | M . | SE . | M . | SE . |
I | 2.032 | 0.079 | 2.244 | 0.106 | 2.316 | 0.107 | 2.111 | 0.093 | 1.887 | 0.083 | 1.604 | 0.083 | 1.144 | 0.059 | 2.920 | 0.113 | |
II | 2.278 | 0.085 | 2.709 | 0.121 | 2.670 | 0.108 | 2.231 | 0.104 | 2.047 | 0.107 | 1.732 | 0.095 | 1.347 | 0.061 | 3.208 | 0.124 | |
III | 2.513 | 0.078 | 3.208 | 0.111 | 2.857 | 0.106 | 2.625 | 0.097 | 2.142 | 0.091 | 1.732 | 0.087 | 1.504 | 0.057 | 3.522 | 0.117 | |
IV | 2.307 | 0.072 | 2.975 | 0.115 | 2.576 | 0.107 | 2.299 | 0.090 | 1.987 | 0.085 | 1.698 | 0.097 | 1.434 | 0.045 | 3.180 | 0.119 | |
V | 2.123 | 0.076 | 3.513 | 0.146 | 2.652 | 0.129 | 1.575 | 0.079 | 1.467 | 0.071 | 1.407 | 0.080 | 1.824 | 0.079 | 2.422 | 0.089 | |
VI | 2.787 | 0.125 | 3.393 | 0.137 | 3.062 | 0.158 | 2.881 | 0.167 | 2.448 | 0.142 | 2.151 | 0.133 | 2.434 | 0.095 | 3.140 | 0.164 |
3.1 Experiment 1
3.1.1 Error on the Axis ()
3.1.2 Error on the Axis ()
Differently than for , results highlighted a main effect of distance, F (1.29, 64.35) 37.126; p 0.001; 0.426, but a not significant effect of eccentricity. Also, for this measure, a statistically significant interaction between eccentricity and distance was found, F (1.68, 84.24) 28.009; p 0.001; 0.359. Pairwise comparisons of between eccentricity levels, for each distance level (see Figure 3b), showed that for “Near” objects the error on the axis varied significantly (p 0.001) between targets presented at an eccentricity of 0 (M −0.834; SD 0.526), of 15 (M −0.662; SD 0.448), and of 30 (M −0.322; SD 0.44). Moreover, considering the “Mid” distance level, a significant difference (p 0.05) between objects shown at 0(M 0.111; SD 0.793), and at 30 (M 0.488; SD 0.822) was highlighted. Finally, “Far” distance targets presented a statistically significant difference for between all the three eccentricity levels [30 (M −1.813; SD 2.01); 15 (M −0.992; SD 1.455); 0 (M −0.215; SD 1.905)]. On the other hand, significant differences have been revealed between “Near”(M −0.834; SD 0.526) and the two distances “Mid” (M 0.11; SD 0.793) and “Far” (M −0.215; SD 1.905) for targets at 0 eccentricity. The “Mid” distance level differed significantly with both “Near” (M −0.662; SD 0.448) and “Far” (M −0.992; SD 1.455) objects presented at an eccentricity of 15. Finally, all distance levels shown significant differences (p 0.001) for targets presented at an eccentricity of 30 [“Near” (M −0.322; SD 0.44); “Mid” (M 0.488; SD 0.822); “Far” (M −1.813; SD 2.01)]. The overall distribution on both axes and is graphically displayed in Figure 3c.
3.1.3 Accuracy Error ()
Results showed a main effect of eccentricity, F (1.5, 74.87) 47.78; p 0.001; 0.489, and distance, F (1.3, 64.88) 90.533; p 0.001; 0.644, together with a statistically significant interaction between the two variables, F (2.07, 103.54) 24.517; p 0.001; 0.329. We then proceeded with the analysis of the simple effect to check the variations of the accuracy between the three levels of eccentricity for each distance level. Pairwise comparisons with Bonferroni correction showed that varied significantly (p 0.01) between the three levels of eccentricity for “Mid” and “Far” distance levels. Descriptive statistics are reported in Table 1. For “Near” targets, the only statistically significant result (p 0.001) was for targets shown at an eccentricity of 0(M 0.922; SD 0.456), which differed from targets presented at 15(M 0.799; SD 0.372), and at 30 (M 0.756; SD 0.232). On the other hand, the comparisons made between the three distances inside each eccentricity level highlighted a statistically significant difference (p 0.001) in terms of between all the distance levels considered, except for the comparison between the “Near” (M 0.922; SD 0.456) and the “Mid” M 0.917; SD 0.497) distances at eccentricity = . Results are shown in Figure 3d.
3.2 Experiment 2
More specifically, results showed significant main effects of , F (4, 200) 338.197; p 0.001; 0.871, , F (1, 50) 773.667; p 0.001; 0.939, and Target MOI, F (2.3, 114.98) 11.682; p 0.001; 0.189, on . The analysis did not highlight a significant three-way interaction between the considered factors, F (1.35, 0.84) 1.612; p 0.077; 0.031. Nevertheless, two-way interaction effects have been found between:
and , F (4, 200) 4.439; p 0.002; 0.082
and Target MOI, F (12.47, 623.28) 11.986; p 0.001; 0.193
and Target MOI, F (3.45, 172.52) 54.683; p 0.001; 0.522
Pairwise comparisons with Bonferroni correction ( revealed that for the interaction between and , every level of differed significantly (p 0.001) from each other within and between each level, as visible in Figure 4.
3.2.1 MOIs and the Role of Context
For the bicycle, our results showed that a left turn directed towards the correct road lane (MOI I) (M 2.032; SE 0.079) corresponded to a statistically significant (p 0.005) greater accuracy in the participants' prediction compared to the mirror maneuver towards the right (MOI II) (M 2.278; SE 0.085), which was against the usual traffic direction. The analysis did not highlight the same difference for the specular pattern, where a right turn following the traffic rules (MOI IV) (M 2.307; SE 0.72) did not present significant differences in terms of accuracy compared with the mirrored maneuver towards the left (MOI III) (M 2.513; SE 0.078).
Considering the pedestrian MOIs, significant differences appeared between the accuracy error in predicting MOI VI (pedestrian moves towards crosswalk) (M 2.787; SE 0.125) and MOI V (pedestrian crosses away from crosswalk) (M 2.123; SE 0.76). Interestingly, the accuracy error was higher for targets directed toward the crosswalks. Descriptive statistics are reported in Table 2.
Nevertheless, ANOVA results also reported two-way interactions between target MOIs and , and also values. Focusing on the interaction with (see Figure 5a), pairwise comparisons highlighted a pattern consistent with the main effects for the bicycle maneuvers (MOIs I--II--III--IV) in the first two levels of (i.e., 0.25 and 0.5 second), while the differences between the pedestrian maneuvers (MOI V–VI) were significant only for the three higher levels of (i.e., 0.75, 1, and 1.25 second). On the other hand, focusing on the comparisons of the same MOI between the different , the overall trend was constituted by higher mean for lower . More precisely, pairwise comparisons showed that:
for MOI I, in the first three levels (i.e., 0.25 s, 0.5 s, and 0.75 s) was significantly higher than the two remaining levels of (i.e., 1 s and 1.25 s), which differed significantly from all the other conditions.
for MOI II, all levels differed significantly from each other, except for 0.25 s and 0.5 s, and 0.75 s compared to 1 s.
for MOI III, the pattern was similar, with all levels presenting significant differences between each other, except between 0.5 s and 0.75 s.
for MOI IV, only two comparisons did not present statistically significant differences, the one between 0.5 s and 0.75 s. levels, and the one between 1 s and 1.25 s.
for MOI V, the error in 0.25 s and 0.5 s was significantly higher between them and compared to all the other levels.
for MOI VI, 0.25 s and 0.5 s differed from 0.1 s and 1.25 s, while 0.75 s differed from all the others except 0.5 s.
For the interaction between target MOIs and (see Figure 5b), pairwise comparisons confirm the previous highlighted pattern between MOIs I–II and V–VI for both levels of . Moreover, for the second level (i.e., 3 s), pairwise comparisons showed a significantly lower (p 0.05) for MOI IV (i.e., a right turning maneuver performed according to traffic rules) (M 3.18; SE 0.119) compared to the specular MOI III (M 2.522; SE 0.117), performed against the usual traffic direction. Regarding the comparison of each MOI between the two levels, the analysis highlighted a significant (p 0.001) higher for 3 s compared to 2 s, for all the maneuvers considered.
4 Discussion
The results obtained in Experiment 1 highlighted the role played by eccentricity and distance on our ability to perceive objects' position in an immersive VR environment. In particular, the accuracy seems to significantly vary as a function of these two parameters. Generally speaking, participants committed significantly greater errors for objects located more peripherally with respect to the center of the field of view, compared to objects located at 15 or at the center (0) of the field of view. At the same time, the accuracy decreased with increasing distance of the target, with larger error rates for far objects, describing a two-way interaction between distance and eccentricity in accordance with previous findings on the anisotropy of the visual space. Focusing on the distribution of this error, our participants found it harder to correctly estimate the distance of the targets than their lateral position with respect to the field of view, resulting in higher average errors on the axis. These results have a double value. Firstly, they show how the perception and localization of static visual targets in virtual environments are affected by the same visuospatial components that influence perception in the real world. Secondly, Experiment 1 provides an initial picture of the role of these variables in a virtual environment, providing reference measurements and showing how much distance and eccentricity can affect the accuracy with which an individual is able to locate a visual target in a virtual environment. For example, our study paradigm showed that the localization of a visual target positioned at 30 of eccentricity was affected by an error that increased sharply and almost in accordance with the increase in the target distance, doubling itself when the distance is doubled, and quadrupling for distances four times greater. The same trend was not visible with the same strength at reduced eccentricity levels, where the differences in localization accuracy were still significant but lower for targets positioned at 15 eccentricity, showing a much sharper increase in the accuracy error only for “Far” objects, compared to targets placed at “Near” and “Mid” distances.
The results of Experiment 2 clearly highlighted the role that the two considered time variables had on the accuracy of the prediction. Overall, the accuracy error was significantly reduced as the duration increased, and more accurate predictions were provided for shorter interval (i.e., predict the target position after 2 s since the beginning of the MOI), compared to the longer interval (i.e., 3 s). Furthermore, the interaction between and proved to vary significantly between each possible combination of the two variables considered. These results provide an initial estimate of the effect of perception dynamics still poorly explored within immersive VR environments. The measures obtained have as their main objective favoring the possibility of adapting experimental protocols used in the real world---which are often limited to laboratory contexts (e.g., Time-to-Contact paradigms)---to immersive VR environments. In this way, it would be possible to bring the participant into specific situations, allowing the implementation of natural behaviors and maintaining at the same time a high level of control over study variables for experimenters, resulting in a considerable increase in terms of validity and reliability of the data obtained.
The advantages provided by the use of immersive VR are also the basis of the interesting results observed when considering the effect of contextual information available in a realistic and ecologically valid environment. Focusing on the six maneuvers considered, the differences identified between the various MOIs are very interesting and deserve to be discussed. The analysis of the main effects of target MOIs revealed that the context in which the maneuver was performed had an influence on participants' accuracy. More precisely, for MOIs I, II, III, and IV performed by the bicycle (see Figure 5), those following the traffic rules (MOIs I and IV) were associated with a more accurate prediction, even if the difference is statistically significant only in one of the two pairs considered (i.e., between MOI I and II). One hypothesis to explain this result could be related to the role that expectations due to contextual information play in our ability to perceive and interpret incomplete data to make a prediction. As shown in Figure 5a, it is interesting to note how the significant differences between the first and the second MOI are mainly present in the two lower levels of , where the information collected to elaborate the prediction is at a minimum. In this context, the error in MOI II is significantly greater than in MOI I. Therefore, it is possible to hypothesize that the participants, in case of uncertainty, preferred to indicate a maneuver that followed the traffic code, and that in some way met their expectations. On the contrary, when the information collected on the maneuver was sufficient (i.e., for longer ), the difference in the error between the two maneuvers is considerably reduced.
However, results are reversed for what concerns the maneuvers performed by pedestrians (i.e., MOIs V–VI). In this case, MOI VI directed towards a crosswalk shows a significantly greater accuracy error than the maneuver performed in the absence of the crosswalk, and it seems to contradict the hypothesis formulated for the previous results. Nevertheless, considering the levels 0.25 s and 0.5 s visible in Figure 5a, we note that not only are the differences not significant but that for the shorter visibility interval, the error trend is opposite to that observed in all the other conditions. The hypothesis developed in the case of bicycles could therefore be valid in this case, wherein the lack of information led participants to predict a maneuver in accordance with traffic rules, resulting in a greater error for the maneuver that instead disregards these expectations. For all the other cases, it seems instead that the presence of contextual information represented by the pedestrian crossing led the participants to more or less automatically orient their prediction towards this stimulus, reducing the importance given to information such as speed or direction of the maneuver.
5 Conclusions
In the presented study, we evaluated the accuracy with which humans can perceive, localize, and predict the position of a visual target in an immersive VR environment. In the first experiment, we demonstrated how distance and eccentricity are able to significantly affect the accuracy with which an individual can locate a static target, while in the second experiment, we measured the accuracy with which an individual is able to predict the position of a moving visual target in an immersive VR environment, considering different time intervals during which the target is visible and different time horizons on which to base the prediction. The increase in the ecological validity of the environment provided by the use of immersive VR allowed us not only to test the effect of the considered variables but also to evaluate the impact that contextual information can have on the accuracy of the prediction. The replication of these results, and their extension, should be made simpler by the fact that we used easily accessible hardware and software solutions. However, the environments that we used in the simulation were still much simpler than typical urban scenarios. Future improvements should work on this aspect, as well as on investigating the effects of the simultaneous presence of multiple targets. Another limitation concerns the mean age of our participants, under the age of 35. Since age is a known influencing factor of the UFOV range, future development should try to extend the participants' age range to identify possible differences between different age groups. Finally, in the two reported experiments we implemented the same pointing methodology. As reported in the Methods section, our choice was guided by the desire to optimize the task for a VR environment. Still, different methodologies have been used in previous works to report distances both in VR and in real-world setups. Future works might try to apply and compare different pointing methods to the presented experimental setup in order to establish which one is the more reliable in terms of accuracy.
To conclude, we expect that the reported results will constitute a valid baseline on which to quantify, in immersive VR, the impact that different design choices (e.g., cockpit layout) and driver-vehicle interaction modes may have on the ability of the driver to perceive surrounding road users.
Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Author Contributions Statement
All authors conceived the experiments; N.D., L.R., and M.S.L. conducted the experiments. N.D. analyzed the results. A.C. and F.F. supervised the project. All authors reviewed and approved the manuscript.
Competing Interests
The authors declare no competing interests.