The objective of this article is to conduct a narrative literature review on multisensory integration and propose a novel information processing model of presence in virtual reality (VR). The first half of the article introduces basic multisensory integration (implicit information processing) and the integration of coherent stimuli (explicit information processing) in the physical environment, offering an explanation for people's reactions during VR immersions and is an important component of our model. To help clarify these concepts, examples are provided. The second half of the article addresses multisensory integration in VR. Three models in the literature examine the role that multisensory integration plays in inducing various perceptual illusions and the relationship between embodiment and presence in VR. However, they do not relate specifically to presence and multisensory integration. We propose a novel model of presence using elements of these models and suggest that implicit and explicit information processing lead to presence. We refer to presence as a perceptual illusion that includes a plausibility illusion (the feeling that the scenario in the virtual environment is actually occurring) and a place illusion (the feeling of being in the place depicted in the virtual environment), based on efficient and congruent multisensory integration.
1 Multisensory Integration in the Physical Environment
Prior to the 2000s, research on multisensory integration focused on the processing of information by each sense independently (Alais, Newell, & Mamassian, 2010). This first research perspective on multisensory integration posited that each sense processes information independently, which is then integrated in other areas of the brain (Treisman & Gelade, 1980). However, studies have suggested that there are multisensory neurons in the brain that are specifically used to integrate multisensory stimuli (e.g., in the superior colliculus) (King & Palmer, 1985; Meredith & Stein, 1983, 1985, 1986a, 1986b, 1987).
1.1 Input (Sensory Information)
Input is first received by the senses (i.e., vision, hearing, taste, olfaction, touch, proprioception) and needs to be processed and integrated. Multisensory integration can be defined as the process by which inputs from two or more senses are combined to create a coherent representation of the world (Chalmers, Howard, & Moir, 2009; Stein, Stanford, & Rowland, 2009, 2014; Welch & Warren, 1980), which can affect perception, decisions, and overt behavior (Stein et al., 2009).
1.2 Implicit Information Processing
1.2.1 Multisensory Integration (Contingencies Match)
Once the input is received by the senses, it is integrated in the corresponding sensory areas of the brain through implicit information processing (i.e., core information processes occurring very rapidly and without awareness). Certain conditions are necessary for stimuli to be integrated in a multisensory manner. This section focuses on two of three general multisensory integration principles (conditions): (1) the temporal principle (Alais et al., 2010; Klasen, Chen, & Mathiak, 2012; Spence & Squire, 2003), and (2) the spatial principle (Alais et al., 2010; King & Palmer, 1985; Klasen et al., 2012, Meredith & Stein, 1986a). The third principle is referred to as the principle of inverse effectiveness, but it is not discussed since it is not included in our model and is thus beyond the scope of this article (see Meredith & Stein, 1983, 1986b; Stein & Meredith, 1993 for a detailed discussion). These principles emerged from previous studies that examined multisensory neurons in cats, monkeys, and rodents (King & Palmer, 1985; Meredith, Nemitz, & Stein, 1987; Stein & Meredith, 1990, 1993). A deeper discussion of the temporal and the spatial principles is necessary given that our model is inspired from them.
184.108.40.206 Temporal Principle
Previous research has suggested that stimuli need to be presented close together in time (temporal principle) to be integrated in a multisensory manner (Alais et al., 2010; Klasen et al., 2012; Spence & Squire, 2003) and for a coherent representation of the environment to be created (Stein & Meredith, 1990). This allows individuals to respond faster and more appropriately to the environment (Stein & Meredith, 1990). When two stimuli are presented close together in time, individuals will perceive them as one event. When there is a significant delay between them, individuals will perceive them as two separate events (Keetels & Vroomen, 2012). Much of the research conducted in humans has focused on temporal windows and the perception of synchronicity. Although studies have shown that stimuli must be presented within a small temporal window to be perceived as one event (or synchronous), the exact size of the temporal window is unclear. Levitin, MacLean, Mathews, Chu, and Jensen (2000) examined the temporal window needed for participants to consider visual and auditory stimuli and tactile and auditory stimuli as synchronous. Specifically, the authors found that participants perceived the presentation of the stimuli as asynchronous when the auditory stimuli were presented 41 milliseconds (ms) before and 45 ms after the visual stimuli. Participants also perceived a delay when the auditory stimuli were presented 25 ms before and 42 ms after the haptic stimuli. While these results are preliminary because they are based on a small sample (N = 8), they suggest that the temporal window should be very small for stimuli to be perceived as synchronous.
There appears to be a tolerance for greater discrepancies between the presentation of visual and auditory stimuli, at least for the visual perception of movement. For example, Sekuler, Sekuler, and Lau (1997) conducted a study where two identical objects moved across a computer screen toward each other, coincided (i.e., continuous motion, pause for 1 frame, or pause for two frames), and then moved apart. Either no sound was presented during the trial or a sound was presented 150 ms before, during, or after the coincidence. The objects could be perceived as either continuing to move in their original directions after coincidence or as having collided with each other and reversing directions. Participants were exposed to each stimulus combination 20 times, in a random order. The authors found that even if the sound was presented 150 ms before or after the coincidence (i.e., asynchronous presentation), participants were more likely to perceive the objects as having collided and reversed directions than when no sound was presented (Sekuler et al., 1997). In another study, Stone et al. (2001) presented visual and auditory stimuli in asynchronous temporal windows. They found that participants felt the stimuli were presented synchronously when the auditory stimuli were presented 20 ms before and up to 150 ms after the visual stimuli. Therefore, it seems that in some contexts the temporal window can be wider than initially thought and could be task-dependent (van der Stoep, Postma, & Nijboer, 2017).
220.127.116.11 Spatial Principle
Prior studies have suggested that stimuli are more likely to be integrated in a multisensory manner if they are presented close together in space (i.e., spatial principle) (Alais et al., 2010; King & Palmer, 1985; Klasen et al., 2012, Meredith & Stein, 1986a). Similarly to the temporal principle, when stimuli are aligned in space, a coherent representation of the environment is created, which allows individuals to respond faster and more appropriately to it (Stein & Meredith, 1990). The brain occasionally receives conflicting information from different senses, which can lead to perceptual illusions and to one sense biasing or dominating another (van der Stoep et al., 2017). An example is the ventriloquism effect, where visual information biases auditory information (Bertelson & Aschersleben, 1998; Sarlat, Warusfel, & Viaud-Delmon, 2006; Spence & Squire, 2003). In other words, auditory stimuli are perceived as coming from the same location as visual stimuli, even if they are physically coming from two distinct locations. Vision can also bias tactile or proprioceptive information. For instance, if an individual lifts two objects of the same weight, but with two different volumes, he or she will perceive that the object with the smallest volume is lighter than the one with the larger volume (Cross & Rotkin, 1975; Ellis & Lederman, 1993). Vision seems particularly important to interpret conflicting information in spatial tasks and to give meaning to experiences (Ernst & Bülthoff, 2004).
Although vision is important for perception, it does not always dominate (Ernst & Bülthoff, 2004). Depending on the task, another sense may dominate if the information it provides is more precise. For example, hearing seems to dominate when the task requires a temporal judgment, as it provides more precise information than vision (Ernst & Bülthoff, 2004). When one light flash and several sounds are presented, the light flash can be perceived as several light flashes. This perceptual illusion is stronger when the delay between the light flash and the sounds is 100 ms or less (Shams, Kamitani, & Shimojo, 2002). In addition, when the temporal window is small (i.e., less than 40 ms), auditory stimuli are perceived before visual stimuli (Kanabus, Szelag, Rojek, & Poppel, 2002). Another study showed that when a visual stimulus was preceded by an auditory stimulus, the visual stimulus was perceived significantly faster; but if it was followed by the auditory stimulus, it was perceived significantly later (Fendrich & Corballis, 2001). These results suggest that hearing can dominate vision (i.e., the auditory information can bias the visual information) when a temporal judgment is required.
In short, two important principles of multisensory integration have emerged from the scientific literature: (1) the temporal principle (i.e., multisensory stimuli are more likely to be integrated in a multisensory manner if they are presented closely in time) (Alais et al., 2010; Klasen et al., 2012; Spence & Squire, 2003), and (2) the spatial principle (i.e., multisensory stimuli are more likely to be integrated in a multisensory manner if they are presented closely in space) (Alais et al., 2010; King & Palmer, 1985; Klasen et al., 2012, Meredith & Stein, 1986a).
1.3 Explicit Information Processing
If sensory stimuli are temporally and spatially congruent, their coherence (logical congruence, i.e., whether the stimuli “make sense”) will be assessed through more explicit information processing. Whether stimuli will be considered coherent also depends on an individual's previous experience and expectations. Skarbez, Brooks, and Whitton (2017, 2018, 2020) and Skarbez, Neyret, Brooks, Slater, and Whitton (2017) used the term “coherence” to refer to the realism and fidelity of a virtual environment (VE), which contribute to the illusion that the events occurring in a VE are actually occurring (plausibility illusion, Psi; a more in-depth discussion will follow in the section “Subjective Experience”). The authors argued that a VE is coherent if it has an internal logical and behavioral consistency and matches user expectations and previous experiences. A VE does not need to exactly represent the physical environment to have a high level of coherence. Similarly, Gilbert (2016) used the term “authenticity.” A VE will have high authenticity if the “affordances and simulations chosen in its implementation” respect user expectations based on their experiences in the physical environment and their intentions in the VE. Although the authors used the terms “coherence” and “authenticity” in the context of VR, they seem applicable to the physical environment as well. When stimuli are assessed as coherent, different perceptual illusions can be induced (e.g., experimentally induced body illusions).
18.104.22.168 Experimentally Induced Body Illusions
Multisensory integration plays an important role in body representation and body ownership. In this section, different experimentally induced body illusions and the conditions necessary to induce them are presented to prepare the context for immersions in VR and for our model on information processing and presence. These body illusions are examples of perceptual illusions that can be induced both in the physical environment and in VR (e.g., IJsselsteijn, Kort, & Haans, 2006). Body illusions are also examples that show that the temporal and the spatial principles and coherence can be important for inducing perceptual illusions.
Body representation provides information on the shape of body parts and their spatial location to distinguish what is and is not part of our body (Proske & Gandevia, 2012). Body ownership is the feeling that our body is ours (Proske & Gandevia, 2012). The manipulation of visual and tactile or proprioceptive stimuli can lead to body illusions. The rubber hand illusion and the full-body illusion are commonly referred to as body ownership illusions. The most studied illusions include, among others, the rubber hand illusion (the illusion that a rubber hand is our own), the full-body illusion (the illusion that an artificial body is our own), the body swap illusion (the illusion of exchanging our body with that of another individual), and the enfacement illusion (the illusion that the face of another individual is ours) (Serino et al., 2013). Numerous studies have shown that when multisensory integration is manipulated, individuals can appraise these body ownership illusions as believable.
22.214.171.124.1 Rubber Hand Illusion
Botvinick and Cohen (1998) were the first to report the rubber hand illusion paradigm. To induce this illusion, the participant's physical hand was placed out of sight under a table and a rubber hand was placed on the table where the participant's physical hand should be. Participants were assigned to one of the following two conditions: (1) synchronous stroking of the rubber hand and the physical hand with a paintbrush, or (2) asynchronous stroking of the rubber hand and the physical hand with the paintbrush. If the stroking was synchronous, participants felt that the rubber hand was their own and that they could feel the stroking of the rubber hand. The results showed that the illusion was induced in those who received the synchronous stroking, but not in those who received the asynchronous stroking (Botvinick & Cohen, 1998). These results suggest that a small temporal window between visual and tactile stroking of the physical hand and the fake hand is important to induce the illusion (i.e., the illusion is believable). Other studies have found similar results (e.g., Ehrsson, Holmes, & Passingham, 2005; Ehrsson, Wiech, Weiskopf, Dolan, & Passingham, 2007). Usually, a window longer than 500 ms will not induce the illusion (Zoulias, Harwin, Hayashi, & Nasuto, 2016).
Synchronous visual and tactile stroking is not a sufficient condition to induce the rubber hand illusion (Tsakiris & Haggard, 2005). Additional conditions must be met, to a certain extent, for the rubber hand illusion to be induced. The fake hand must have a realistic orientation and handedness. For example, it cannot be placed at an angle of 180 degrees or be a right hand while the physical hand is a left one since this orientation and handedness are impossible (Maselli & Slater, 2013). Furthermore, the fake hand must resemble a physical hand (Tsakiris & Haggard, 2005) and be located near the physical hand (Lloyd, 2007; Preston, 2013). However, the rubber hand illusion is flexible. For instance, it is possible to induce it by using an inflated glove. It also works on other limbs (e.g., feet). The illusion will be stronger if the fake limb resembles the physical limb. The illusion can be induced even if the skin color of the fake hand is different from the participant's (Farmer, Tajadura-Jiménez, & Tsakiris, 2012) or if an arm is added and it is hairy or longer than normal (Caola, Montalti, Zanini, Leadbetter, & Martini, 2018). These results suggest that the visual realism of the rubber hand remains to some extent less important than the synchronous presentation of stimuli (temporal congruence) to induce the illusion. Furthermore, these studies suggest that spatial congruence (i.e., the fake hand is near the participant's physical hand) and coherence (logical congruence) between the visual and tactile or proprioceptive stimuli (i.e., when the participant sees the fake hand being stroked, his or her physical hand is also being stroked) helps increase the believability of the illusion.
126.96.36.199.2 Full-Body Illusion
The full-body illusion can be induced in the physical environment (with cameras) or in VR. The procedure for inducing it is similar to the one for the rubber hand illusion (see Petkova & Ehrsson, 2008 and Slater, Spanlang, Sanchez-Vives, & Blanke, 2010 for a detailed procedure). The synchronous presentation of stimuli and the first-person perspective (egocentric) are important to induce the full-body illusion in the physical environment and in VR (Aspell, Lenggenhager, & Blanke, 2009; Petkova & Ehrsson, 2008; Petkova, Khoshnevis, & Ehrsson, 2011; Slater, Spanlang, Sanchez-Vives, & Blanke, 2010). If the third-person perspective (allocentric or from a bird's-eye view) is used, participants will feel that the event is happening to someone else rather than to them. In other words, they will feel like they are seeing someone else instead of themselves (Ehrsson, 2007). If the illusion is induced in VR, it is stronger if the virtual body or mannequin resembles the physical body. For example, the illusion is less strong if participants see a non-corporeal object (Lenggenhager, Tadi, Metzinger, & Blanke, 2007). However, the full-body illusion is also flexible. For instance, one study found that the illusion could be induced by using an invisible body (Kondo et al., 2018). These studies suggest that temporal and spatial congruence and coherence between the visual and tactile or proprioceptive stimuli helps increase the believability of the illusion. In other words, when the participant sees the virtual abdomen being stroked, his or her physical abdomen is also synchronously being stroked and the virtual abdomen is aligned with his or her physical abdomen.
In summary, multisensory integration plays an important role in inducing body illusions. It appears that presenting the stimuli within a small temporal window is more important to induce these illusions than high visual realism.
188.8.131.52 Multisensory Integration of Emotions
In this section, the multisensory integration of emotions is discussed, including the importance of the presentation of coherent (logically congruent) stimuli. This section will help prepare the context for our model on information processing and presence where coherent stimuli are a key component for inducing high presence.
Similarly to sensory stimuli, emotional stimuli are integrated in a multisensory manner (Klasen et al., 2012). The coherence of stimuli is important in the processing of emotions. For example, when individuals are exposed to a phobogenic stimulus (such as a dog, in a case of dog phobia), their brain receives visual (e.g., appearance of the dog) and auditory (e.g., barking and growling) information that induce fear. However, if their brain receives conflicting or incoherent information (e.g., seeing a dog, but hearing a different animal), they might feel less or not fearful. Another example is an individual suffering from aviophobia taking a plane. His or her brain receives proprioceptive (e.g., turbulence), olfactory (e.g., bad smell in the air), visual (e.g., concern of other passengers), and auditory (e.g., conversations of other passengers) information, which jointly contribute to provoke fear. If the stimuli are believable (coherent), the individual with a phobia is more likely to be fearful of the object, animal, or situation. In short, the brain processes and integrates multisensory stimuli, producing an emotional response.
Many studies that have examined the multisensory integration of emotional stimuli have experimentally manipulated facial expressions and speech (Klasen et al., 2012). Coherent emotional audiovisual stimuli seem to be useful to identify emotions faster and more accurately than a visual or auditory stimulus presented alone. In one experiment (i.e., Collignon et al., 2008), participants were presented several blocks (randomly interleaved) of emotion expressions using (1) visual alone (video clips without any sound), (2) auditory alone (only the sound from the video clips), (3) congruent audiovisual (match between visual and auditory stimuli), or (4) incongruent audiovisual stimuli (mismatch between visual and auditory stimuli). Their task was to categorize the expressions as fear or disgust. The results showed that the participants categorized the expressions faster and more accurately when they were presented the congruent audiovisual stimuli than the visual alone or auditory alone stimuli. Interestingly, the results also showed that the participants relied on the visual stimuli to categorize the expressions when the incongruent stimuli were presented. However, when the reliability of the visual stimuli diminished in the incongruent blocks (i.e., added noise/static to the video clips), the results indicated that the participants relied on the auditory stimuli to determine whether the expression presented was fear or disgust. These findings suggest that vision can dominate when completing perceptual tasks, but also when completing emotional tasks. The authors concluded that congruent stimuli helped the participants categorize the emotion expressions, and that visual dominance appeared to be situation dependent. As we will see later on, the integration of coherent stimuli offers an explanation for people's reactions during VR immersions.
2 Multisensory Integration in Virtual Reality (VR)
Most studies examining multisensory integration and the perceptual illusions that can be induced have been conducted in the physical environment (as opposed to in a virtual environment [VE]). In addition, several of them used light flashes as visual stimuli and clicks as auditory stimuli (e.g., Shams et al., 2002). As a result, they may lack ecological validity. It would be interesting and necessary to study these effects in virtual reality (VR) and to use different stimuli. VR can be defined as a set of computer technologies offering an immersive environment, allowing the user to feel present and interact with this other reality in real time (Wiederhold & Bouchard, 2014). Benefits of VR include being able to expose users to realistic scenarios—that are dangerous or not feasible to study in the physical environment—in a safe and controlled environment and to simulate tasks that involve costly equipment (Aguinas, Henle, & Beaty, 2001). In this section, two topics are covered: (1) perceptual illusions induced in VR, and (2) models explaining how these illusions are induced. The main objective of this section is to explain the relationship between multisensory integration and VR and to prepare the context for presence, which is a perceptual illusion induced in VR and the main component of our model. The relationship between multisensory integration and presence is then described.
Globally, research suggests that the process of integrating stimuli from a VE and the physical environment is similar. When the brain receives conflicting information from different senses, perceptual illusions can be induced (van der Stoep et al., 2017), similar to those in the physical environment. For example, visual information can bias vestibular information in VR. When individuals are immersed in VR, they may have the perception of movement (Lécuyer, 2017), even if the physical environment is not moving. They may have this impression since the VE updates itself visually according to their head movements and the movements they make with the controllers, but their physical body remains in the same location. This mismatch can lead to cybersickness (Allison, Harris, Jenkin, Jasiobedzka, & Zacher, 2001; McCauley & Sharkey, 1992), which resembles motion sickness in the physical environment. Research shows that a delay of a few ms between stimuli in the VE can be perceived by users and affect their experience. For instance, researchers suggest that a delay of 17 ms or more between visual stimuli and participants’ head movements in a VE will be perceived (Adelstein, Lee, & Ellis, 2003). In another study, Meehan, Razzaque, Whitton, and Brooks (2003) compared the effect of an added delay of 50 ms and of 90 ms on presence in an anxiety-provoking VE. Self-report data from164 participants were analyzed and the results showed that presence was lower in the condition with the added delay of 90 ms. These results suggest that the temporal principle (Alais et al., 2010; Klasen et al., 2012; Spence & Squire, 2003) also applies to VR, and that delays are detrimental to presence. In addition, studies have shown that the presentation of visual and tactile or proprioceptive stimuli should be synchronous to induce body illusions in VR (e.g., Ehrsson et al., 2007; González-Franco, Peck, Rodríguez-Fornells, & Slater, 2014; Lenggenhager et al., 2007). These results suggest that individuals integrate multisensory information from VEs and the physical environment in a similar way. In other words, it is important for stimuli to be presented close together in time to be integrated in a multisensory manner in VR. If there is a delay between virtual stimuli, the users' response in the VE should be similar to their response in the physical environment.
2.1 The Role of Multisensory Integration in Inducing Perceptual Illusions in VR
Although multisensory integration theories exist in Neuroscience and Psychology, there are few specific to VR. Riva et al. (2018) and Riva, Wiederhold, and Mantovani (2019) proposed that VR is effective in assessing and treating various health problems since it creates embodied simulations of the body thanks to the multisensory information received and integrated by the brain. These simulations make it possible to predict and represent the actions and emotions of users. For users to feel present, the body simulation created by the technological system must update according to the users' movements in the VE and must provide the expected sensory information to the brain. For example, there should be no discrepancy between the users' head movements and the visual changes perceived in the head-mounted display. This theory is interesting to explain how the brain processes virtual multisensory information related to embodiment and how users immersed in VR come to feel and act as if they are actually in the VE and no longer in the physical environment (e.g., in a laboratory). This is known as presence, or more specifically, the perceptual illusion of “being there” in the VE (Heeter, 1992).
Gonzalez-Franco and Lanier (2017) proposed a neuroscientific model, inspired by previous research, to explain how illusory experiences are induced in VR. Their model has three components: (1) bottom-up multisensory processing (Blanke, 2012; Calvert, Spence, & Stein, 2004), (2) sensorimotor self-awareness frameworks (Gallagher, 2000), and (3) top-down prediction manipulations (Haggard, Clark, & Kalogeras, 2002).
Bottom-up multisensory processing (Blanke, 2012; Calvert et al., 2004) refers to the strategy used by the brain to integrate sensory stimuli received from multiple senses (Gonzalez-Franco & Lanier, 2017). If the stimuli received are temporally congruent and coherent (logically congruent), the brain will be more likely to appraise them as correct, whereas if they are temporally incongruent and incoherent, the brain could “disregard” one of the stimuli, appraising it as being “incorrect” (e.g., if there is a delay between a virtual character's lip movements and the words that he is pronouncing: the brain could “disregard” the visual stimulus to understand the words pronounced).
Sensorimotor self-awareness frameworks (Gallagher, 2000) refer to the comparisons made by the brain between the multisensory stimuli it has received and the predictions it has made about the actions that should occur (Gonzalez-Franco & Lanier, 2017). For instance, if the users move their head and the VE adjusts to their head movements, they will feel a stronger VR illusion compared to if there were several delays in the tracking of their head movements. Similarly, if they extend their virtual hand by 10 cm to touch a virtual wall, the illusion will be stronger if they visually perceive that the contact with the virtual wall is made at 10 cm than if it is made at 5 cm (and consequently they are unable to touch the virtual wall) or at 15 cm (and their virtual hand consequently passes through the virtual wall). When multisensory stimuli correspond to the anticipated actions, the users will likely feel a strong VR illusion.
Top-down prediction manipulations (Haggard et al., 2002) allow some flexibility to the illusion. The user will feel a strong VR illusion if the multisensory stimuli received match the predictions made by the brain based on the user's previous experience and expectations (Gonzalez-Franco & Lanier, 2017). For instance, if there is a delay of a few ms between two virtual stimuli (a lightning flash and the sound of thunder), a user may be perceived as simultaneous or not depending on previous experience in the physical environment and in VR.
2.2 Subjective Experience
The product of these integration processes will lead to users subjectively feeling as if the experience is actually occurring and as if they are actually in the place depicted in the VE (Slater, 2009, 2018; Slater, Spanlang, & Corominas, 2010). Three models of VR (i.e., Lombard & Ditton, 1997; Gonzalez-Franco & Lanier, 2017; Riva et al., 2018; Riva et al., 2019), which seem to complement each other well, explain how information processing leads to presence, and why users react as if they were actually in the VE. Lombard and Ditton (1997) claim that when users feel present, they “forget” that they are immersed in a VE and that their experience is created by a medium (VR equipment). Gonzalez-Franco and Lanier (2017) explain that multisensory integration leads to various illusions in VR. Riva et al. (2018) and Riva et al. (2019) suggest that embodiment is important to feel present in a VE.
Lombard and Ditton (1997) proposed that presence is a perceptual illusion of non-mediation: users believe that they are in another reality and act accordingly without perceiving or recognizing that this illusion is created by technology (i.e., that they are immersed in VR). In other words, the brain is fooled, and users “forget” that the experience in the VE is created by technology. Users feel present, or like they “are there” in the VE (Heeter, 1992). This is interesting because many (if not most) individuals tend to respond to stimuli in the VE, even though they are consciously aware that they are not physically in that location (Slater, 2003). For example, users will duck to avoid an object flying toward them (Sanchez-Vives & Slater, 2005), and will often walk around a pit instead of over it, even if they know that they will not fall in the physical environment (e.g., Meehan, Insko, Whitton, & Brooks Jr, 2002; Meehan et al., 2003; Sanchez-Vives & Slater, 2005; Slater, Usoh, & Steed, 1995; Usoh et al., 1999). Individuals will often mention after an immersion in VR that they consciously knew that the experience was not real, but they still responded as if it had been.
There appears to be few neuroscientific and information processing models that focus specifically on presence and multisensory integration in VR. As we have started to see above in this article, Riva et al. (2018) and Riva et al. (2019) presented a model to explain the relationship between embodiment and presence in VR. Their model is interesting because it shows the importance of the interaction between the user's actions, the VE's responses, and the sensory information received by the brain. Immersion in VR is a rich and rapid interactive process between the user and the technological system, which must update the body simulation it created according to the user's actions in the VE and provide the expected sensory information to the brain (Riva, Mantovani, & Bouchard, 2014). Riva et al.’s (2018) and Riva et al.’s (2019) model could be extended by including a discussion of the relationship between multisensory integration principles and presence (see the section “A Novel Integrated Information Processing Model of Presence”). Gonzalez-Franco and Lanier's (2017) model explains well how different perceptual illusions are induced in VR. A more explicit description of the relationship between multisensory integration and presence could be an interesting addition to their model. The following paragraphs delve into the details of their model and attempt to explain the relationship between multisensory integration and presence using their model. As we mentioned previously, Gonzalez-Franco and Lanier's (2017) model comprises three components: (1) bottom-up multisensory processing (Blanke, 2012; Calvert et al., 2004), (2) sensorimotor self-awareness frameworks (Gallagher, 2000), and (3) top-down prediction manipulations (Haggard et al., 2002).
The first component of their model stipulates that multisensory stimuli are processed in a bottom-up manner (Blanke, 2012; Calvert et al., 2004). Stimuli can be presented in a VE involving multiple senses (e.g., visual, auditory, tactile, proprioceptive, olfactory) and the brain must integrate them (Gonzalez-Franco & Lanier, 2017), which can impact presence. Slater (2009, 2018) and Slater, Spanlang, and Corominas (2010) argued that presence has two components: the place illusion (PI—the feeling of “being there” in the VE) and the plausibility illusion (Psi—the feeling that the events in the VE are actually occurring) and therefore behaving accordingly. Users will tend to respond to stimuli in the VE implicitly, even if they consciously know they are in the physical environment (e.g., in a laboratory) (Sanchez-Vives & Slater, 2005; Slater 2003; Slater & Usoh, 1993). For example, when they see a table in a VE, they will tend to naturally walk around it instead of through it, even if they know that there is no physical table in the room. The presentation of well synchronized and coherent multisensory stimuli can contribute to the feeling that the events in the VE are actually occurring (the plausibility illusion) and user behavior (Gilbert, 2016; Skarbez, Brooks, & Whitton, 2017, 2018, 2020; Skarbez, Neyret, et al., 2017; Slater et al., 2009, 2018; Slater, Spanlang, & Corominas, 2010), as these stimuli increase the believability of the VE (i.e., the VE makes sense to the user). The believability of the VE and the feeling that the scenario is actually occurring in turn increase the feeling of “being there” in the VE (the place illusion).
Gonzalez-Franco and Lanier's (2017) model then states that the brain makes comparisons between the multisensory stimuli it has received and the predictions it has made about the actions that should occur, which are referred to as sensorimotor self-awareness frameworks (Gallagher, 2000). In other words, when multisensory stimuli match the predicted actions, the brain will be more likely to be fooled and the user will be more likely to feel high presence. For instance, if users (who expect a VE to respect the laws of physics) drop a plate on the floor in the VE and it breaks, they will feel higher presence than if the plate had not broken since they expected it to break, as it would in the physical environment. It is possible that the plausibility illusion leads to the place illusion. In other words, when the VE appears believable, users will feel like the scenario depicted in the VE is actually occurring and respond to stimuli implicitly and will subsequently draw a conclusion about where they are. When the VE reacts appropriately to their actions based on their expectations and previous experience (e.g., if users drop a virtual plate on the virtual ground, it breaks), they will feel as if they “are there” in the VE.
Finally, Gonzalez-Franco and Lanier's (2017) model specifies that top-down prediction manipulations (Haggard et al., 2002) allow some flexibility to illusions in VR. Therefore, even if the visual realism of a VE is reduced, presence may be high since the VE can still represent a physical situation in a realistic way. In other words, if the plausibility illusion is induced, then the place illusion can also be induced. A user can feel as if he or she is “really” in the VE since the VE remains believable. It does not need to identically resemble the physical environment. Indeed, research has suggested that visual realism is important to a certain extent to increase presence (e.g., Hvass et al., 2017a, 2017b), but presence can remain elevated even if visual realism is reduced. For example, in one study, Dinh, Walker, Hodges, Song, and Kobayashi (1999) studied the effect of visual realism and multisensory stimulation on presence and memorization in VR in 322 participants. The participants were randomly assigned to one of sixteen conditions where visual realism was either high (high-resolution textures and local lighting) or reduced (the resolution of the textures was reduced to 25% of the high-resolution textures and ambient lighting), with or without auditory (ambient sounds), olfactory (aroma of coffee), and tactile (wind and temperature changes) stimuli. They were immersed in a virtual office, which included a small and a large office, a copier room, a reception area, a hallway, a bathroom, and a balcony. The results showed that the addition of auditory, tactile, and olfactory stimuli increased participants’ presence and their memorization of the VE. The improved visual realism did not increase their presence or their memorization of the VE. This could be due to the VE still being believable and representing a scenario that could be encountered in the physical environment. In short, the brain appears to integrate virtual stimuli in a similar way to stimuli from the physical environment but will have some flexibility. Provided the VE looks believable (even if it does not exactly resemble the physical environment), users can feel like they “are there” in the VE.
As temporal windows are important for inducing perceptual illusions and for integrating multisensory information, they are likely also important for inducing presence. When virtual stimuli are temporally congruent and coherent, the brain will be fooled into appraising the experience in the VE as believable and actually occurring (plausibility illusion, Gilbert, 2016; Skarbez, Brooks, & Whitton, 2017, 2018, 2020; Skarbez, Neyret, et al., 2017; Slater et al., 2009, 2018; Slater, Spanlang, & Corominas, 2010), which will lead to the feeling that one is actually in the VE (place illusion). It is possible that a significant delay between virtual stimuli (e.g., visual and proprioceptive stimuli) may decrease or break presence (plausibility and place illusions). For instance, if a VE does not update itself when users move their head, presence could be reduced or broken. If users no longer feel present or feel less present in a VE that represents the physical environment, they could react differently in the VE than they would in the physical environment (e.g., walking through virtual walls) since the believability of the VE is reduced. In other words, if the plausibility illusion is reduced or not induced, the place illusion (the feeling of “being there” in the VE) will be reduced or not induced either. Unfortunately, to the authors’ knowledge, the relationship between the temporal windows for multisensory integration and presence (including plausibility and place illusions) has not yet been studied.
2.3 A Novel Integrated Information Processing Model of Presence
2.3.1 Input (Sensory Information)
The proposed model states that the senses first receive sensory information from virtual and/or somatosensory stimuli (if the immersion is not complete, the senses will also receive sensory information from the physical environment). Information is then processed and integrated in the corresponding sensory areas of the brain.
2.3.2 Implicit Information Processing
184.108.40.206 Multisensory Integration (Contingencies Match)
The brain will compare the information received to what is normally expected, based on previous experience (Gonzalez-Franco & Lanier, 2017). Based on previous experiences in the physical environment and in VR (if the individual has previously been immersed in VR), the brain expects temporal congruency between all relevant stimuli, including visual, auditory, proprioceptive, and sensorimotor stimuli (Alais et al., 2010). As we have seen above in this article, this normal information processing applies in vivo as well as in VR. If there is a small yet significant delay between stimuli, individuals will perceive them as two separate events (Keetels & Vroomen, 2012). This multisensory integration processing is implicit: individuals do not consciously realize that it is occurring and cannot verbalize what is occurring because such a delay is rapid and automatically processed (refer to the introduction for more details about temporal windows and synchronicity). If the delay is too long or the spatial location too discrepant to support a coherent multisensory integration, it can be consciously detected and integration will not occur (Alais et al., 2010; Keetels & Vroomen, 2012; Klasen et al., 2012). Some temporal mismatch can lead to unwanted negative side effects (commonly referred to as cybersickness; Allison et al., 2001; McCauley & Sharkey, 1992) and still lead to the conclusion that the experience is believable if the overall meaning of the stimuli is not too ambiguous to be coherent. If multisensory integration fails, the experience will not be appraised as believable and will significantly limit presence.
2.3.3 Explicit Information Processing
Once the sensory information has been integrated as temporally and spatially congruent, the brain will appraise the believability of the overall experience. The ongoing information processing becomes more explicit and continues to rely on a conception matching process where the flow of sensory information is integrated compared with expectations based on previous experience. Templates of previous experience are more complex than at the implicit level and involve behaviors and properties of objects and characters from the physical environment (Gilbert, 2016) and from the VE itself (i.e., the internal logical consistence or coherence of the VE) (Hofer, Hartmann, Eden, Ratan, & Hahn, 2020; Skarbez, Brooks, & Whitton, 2017, 2018, 2020; Skarbez, Neyret, et al., 2017; Slater et al., 2009, 2018; Slater, Spanlang, & Corominas, 2010). The bottom-up and top-down processes of the currently experienced situation are expected to match internal representations of the physical environment and of the VE. In VR, a VE will be appraised as believable when the sensory information is congruent (temporally and spatially) and coherent (logically congruent) and matches expectations about the appearance and behavior of virtual objects or characters (Gonzalez-Franco & Lanier, 2017). In a pilot experiment, Slater and Usoh (1993) found that many participants felt reduced presence because the VE did not respect the laws of physics (e.g., participants could walk through walls, gravity only partially worked) and moving in the VE was not “natural” (i.e., using a mouse instead of physically walking). Some participants mentioned being disappointed that when objects fell to the ground from a height, they did not break or produce a corresponding sound. The authors suggested that participants expected the VE to match the physical environment, and when it did not, there was a mismatch between their expectations and their experience in the VE, which resulted in lowered presence. In other words, participants could have felt reduced presence because the VE was not believable.
In a qualitative experiment, Ghani, Rafi, and Woods (2019) immersed 20 participants in a VE that replicated a physical mosque and some of its surroundings situated in a city in Malaysia. The VE contained animated avatars and ambient sounds. During interviews, some participants who had visited the physical mosque reported that its virtual version was “slightly unrecognizable” as its surroundings lacked certain details from the physical environment. Although this experiment is qualitative, the results suggest that a user's previous experience in the physical environment can impact the believability of a VE and thus the plausibility and place illusions. In other words, the mismatch between some of the participants’ past memories (and expectations) and their experience in the VE diminished their feeling of “being there” at the mosque (place illusion) due to decreased believability and feeling that the experience was actually occurring (plausibility illusion). An encounter with a virtual cat (where a user expects to be immersed in a VE that resembles the physical environment) can be used as another practical example. If the input information is based on coherent virtual visual information (e.g., a virtual black animal shaped like a cat) and auditory information (e.g., meowing), information processing will lead to the conclusion that the object is a cat. On the other hand, if the information processed is incoherent with a priori expectations (e.g., a virtual black animal shaped like a cat that barks), the conclusion will likely be to appraise the virtual object as not very believable. If the same situation is experienced in a pet-store, the incoherency could be dismissed and assumed to come from another animal in the room. If the same situation is experienced by a person suffering from a cat phobia, the perceived threat may be high enough to warrant dismissing incoherent situations in favor of protecting oneself from danger. For instance, if a virtual black quadruped shaped like a cat that meows is encountered in a VE representing the surface of the moon, the phobic might react with fear despite that believability can still be appraised as low. This information processing clearly involves more explicit and complex integration than the automatic integration of multisensory information. The study conducted by Usoh et al. (1999) is another example that suggests the believability of an experience can influence presence. The authors found that participants felt higher presence when they physically walked from one virtual room to another than when they walked-in-place (i.e., they did not physically walk, but instead moved their head in the direction that they wanted to go and their virtual body walked there) or when they flew (i.e., same procedure as walking-in-place, but they flew to the location). It is probable that participants expected to be able to travel the same way in the VE as they would in the physical environment and thus felt reduced presence in those conditions as humans physically move to reach a destination and cannot fly in the physical environment (except if they are traveling in a vehicle that can fly).
A believable VE will also influence the processing of further immersions in VR and change expectations of how a VE should look like and respond (Gonzalez-Franco & Lanier, 2017). For example, if users were exposed to a believable large spider that walks in a particular way in a VE, they might expect spiders to look and respond in that same manner in another VE containing spiders, or in the same VE at a different time. Another example is if users were immersed in a believable VE where they have three arms, they could expect to have three arms in a different VE or in the same VE at a different time. What is experienced in VR can be significant enough to create the illusion of additional limbs (e.g., Won, Bailenson, Lee, & Lanier, 2015). Won, Bailenson, Lee, and Lanier (2015) immersed participants in a VE where they had to hit cubes displayed in three locations in front of them. One location was farther away from them and that cube could be hit by moving forward to reach it. Instead of moving forward, participants in the third limb condition could reach the cube using a virtual third arm that was longer and articulated by rotating their wrists. Although the third limb was unrealistic, participants learned to control it and hit more cubes than when they had two arms. Interestingly, their performance also improved over time: they hit more cubes and at a faster rate during later trials than earlier ones. It seems that participants adapted and learned to use the third arm, despite the fact that they only had two limbs in the physical environment. In another study, Steptoe, Steed, and Slater (2013) examined whether participants could learn to control a virtual tail to block virtual green particle beams. Participants were assigned to either a condition where the tail moved synchronously with their physical hip movements or a condition where the tail moved asynchronously with their physical hip movements. They could block the particle beams with their hands or feet, but some of them were more easily reached with the tail. The results revealed that participants in the synchronous condition felt higher body ownership and agency of the tail than those in the asynchronous condition. Participants in the synchronous condition were also able to successfully control the tail and performed better than those in the asynchronous condition. Furthermore, their performance improved over time and in the final quarter of the game, they were able to better coordinate the movement of their hands and tail. These results support Won et al.’s (2015) and suggest that participants can adapt to having additional body parts, even though this is not possible in physical reality.
These examples illustrate that in VEs where contingency relationships are synchronous, and despite the conditions being unrealistic, the scenario can still be appraised as believable and can update existing expectations about how a VE should look and respond. These expectations are explicit at first and, with practice, become implicit through learning (e.g., participants had to think about the movements to control the third arm or the tail, but they were eventually able to automatically control them). These updated expectations about contingency relationships (e.g., if participants move their hips, the virtual tail will synchronously move) in turn affect believability (when they move their hips, the tail moves the way it is supposed to), presence (feeling like one is actually moving the tail and is in the VE), and behavior in the VE (moving the tail and using it to complete the task).
Interestingly, it seems possible to change participants’ expectations by manipulating the instructions given to them. For instance, Bouchard, St-Jacques, Robillard, and Renaud (2008) exposed 31 participants with snake phobia to an anxiety-inducing VE and to a non-anxiety-inducing VE (randomly assigned order). The participants’ task was to explore the VE (a desert). In the anxiety-inducing VE, participants were informed that there were poisonous, dangerous, and aggressive snakes hidden in the sand and that they would not attack them, but they should be on guard (the VE did not actually contain snakes). In the non-anxiety-inducing VE, participants explored the same environment, but were informed that there were no snakes (the VE did not contain any snakes either). Data from a verbal measure showed that although participants were not exposed to virtual snakes in either condition, their presence was higher when they expected to be exposed to them. During the debriefing, most participants verbally indicated that they behaved differently in the anxiety-inducing VE (i.e., they were not able to explore or control the VE because they were afraid of encountering a snake). It is possible that the expectation of encountering snakes coupled with being immersed in an environment where snakes could actually be found in the physical environment (i.e., a desert) increased the believability of the experience and participants’ feeling that the events were actually occurring (plausibility illusion) and of being in the VE (place illusion), which affected their behavior. The results of this study seem to lend support to the hypothesis that participants’ expectations have an effect on believability and on the plausibility and place illusions.
Embodiment (feeling like one's physical body is substituted by a virtual one, Spanlang et al., 2014) will also contribute to the explicit processing of information appraised as believable and is induced by sensory information relevant to the virtual body. The virtual body can be visible in the virtual space, in the form of an avatar, but it does not have to be. If when a user moves his or her physical body and the corresponding experience in the VE is consistent, the user can feel embodied (Riva et al., 2018; Riva et al., 2019). In the physical environment, one does not have to see one's own hand under the table to grab an object on the floor. The match with the proprioceptive experience and touching the object is enough to feel that it is our body in action. In VR, a match between interoceptive and somatosensory information on the physical body's location and actions, and the impact of the corresponding actions in VR will lead to embodiment when congruent with past experience (Haans & IJsselsteijn, 2012). Having a visible virtual body adds to this experience but is not essential. Temporal and spatial congruency and coherence (the stimuli must be believable) between visual and tactile or proprioceptive stimuli are required and sufficient to induce embodiment (Aspell et al., 2009; Botvinick & Cohen, 1998; Carey, Crucianelli, Preston, & Fotopoulou, 2019; Lloyd, 2007; Preston, 2013; Riva et al., 2018; Riva et al., 2019). Embodiment can also impact whether virtual stimuli are considered believable, and its role is highlighted at a later (explicit information processing) stage than the implicit multisensory integration. Embodiment involves the perception that the body belongs to the user and provides a sense of agency in the VE. It contributes to the believability of the experience and is also reinforced by a believable experience (Riva et al., 2018; Riva et al., 2019). For example, if users feel embodied, they feel like it is their head rotating to examine the surroundings and that they are moving in the virtual space.
2.3.4 Subjective Experience
220.127.116.11 Plausibility Illusion (Psi)
The resulting plausibility illusion (Psi) will be induced if the VE is believable based on the conception matching process described previously. This is what Slater (2009, 2018) and Slater, Spanlang, and Corominas (2010) refer to as the plausibility illusion (i.e., feeling that the events in the VE are actually occurring). If users appraise the virtual scenario as believable and actually occurring (e.g., a virtual cat walking on four legs and meowing instead of walking on two legs and speaking in English), their behavior in the VE will resemble or be identical to their behavior in the physical environment (e.g., they could pet the virtual cat, or run away if they are phobic). If users are instead immersed in a fantastical (and believable) VE, it is likely that their behavior will resemble their behavior from a previous immersion in that VE (e.g., if users were immersed in a fantastical VE where cats spoke in English and the users spoke to one, the users would probably speak to the cat again if it was encountered in a future immersion in that VE).
Furthermore, if the virtual scenario is experienced as actually occurring, it further strengthens the processing of stimuli as believable and matching with expectations (i.e., the plausibility illusion affects the concept “Believable”) (Gilbert, 2016; Skarbez, Brooks, & Whitton, 2017, 2018, 2020; Skarbez, Neyret, et al., 2017; Slater et al., 2009, 2018; Slater, Spanlang, & Corominas, 2010). For example, if trainees are flying a virtual aircraft and feel that they are actually flying it, they will likely find that flying it is believable. Interestingly, the virtual aircraft does not need to exactly resemble a physical one to lead to positive training outcomes (Caro, Corley, Spears, & Blaiwes, 1984).
It is important to reinstate the distinction between the concept of processing information as “believable” and the plausibility illusion. Processing stimuli as believable refers to the process of comparing information with a priori templates of rules and a representation of the world and how it functions. The plausibility illusion refers to the consequent feeling that the events are actually occurring (i.e., the extent to which the virtual scenario is experienced as actually occurring) (Slater, 2009, 2018; Slater, Spanlang, & Corominas, 2010). The strength of the plausibility illusion will be related to how believable the VE is. It is important to note that a VE could be believable without necessarily being realistic: cartoonish looking furniture might not appear realistic, but a user may still walk around it to avoid colliding with it.
18.104.22.168 Place Illusion (PI)
The place illusion (PI) will be induced if the users feel like the virtual scenario is actually occurring (plausibility illusion). When it is induced, users will feel like they are “really there” in the VE (Heeter, 1992) and “forget” that the VE is created by VR equipment (Lombard & Ditton, 1997). The users can be consciously aware that they are in the physical environment (e.g., in a laboratory), but also feel like they are actually in the VE (Sanchez-Vives & Slater, 2005; Slater 2003; Slater & Usoh, 1993). This is what Slater (2009, 2018) and Slater, Spanlang, and Corominas (2010) call the place illusion (the feeling of “being there” in the VE). If the users feel like the VE is unbelievable and that the virtual events are not actually occurring, then they will feel like they are in the physical environment and not in the VE. In other words, the strength of the place illusion will be associated with the strength of the plausibility illusion and the believability of the VE. If the users feel like the VE contains both some unbelievable and believable components based on their expectations, then the plausibility and place illusions could still be induced but should not be very strong. For instance, if they expect to be immersed in a virtual city that resembles a physical city and respects the laws of physics but realize that they can fly in the virtual version of the city, it is likely that they will feel like they are not actually flying in the city and feel a reduced sense of “being there” in the city. If the users feel like the VE is completely unbelievable based on their expectations (e.g., the users expect to be unable to walk through virtual doors and walls but discover that they must do so to move to the next room), then the plausibility and place illusions will likely be very low (i.e., the users will feel like they are not actually walking through walls and doors and like they are in the physical environment instead of in the VE). While the plausibility illusion impacts the place illusion, the reverse is also true: if users believe that they are actually in the VE, they will further feel like the scenario depicted in it is actually occurring.
Few studies have been conducted to specifically study the place and plausibility illusions. In one experiment, Skarbez et al. (2018) examined whether self-report measures could distinguish between the effects of the plausibility and place illusions (the results related to the physiological measures are not reported here, see their article for the full results). To study the plausibility illusion (Psi), the authors manipulated the levels of coherence, which were divided into two concepts (i.e., physical coherence and narrative coherence). Specifically, participants were informed that they needed to complete certain tasks to be able to move on to the next stage of the experiment. In the high Psi condition, the laws of physics applied (physical coherence) and the instructions were valid (narrative coherence), but in the low Psi condition, the laws of physics did not apply, and the instructions were false. To examine the place illusion (PI), the authors manipulated levels of immersion: either the head-mounted display had a field-of-view of 60° diagonal and spatial sound cues and passive haptics were added (high PI) or the head-mounted display had a field-of-view of 30° diagonal and no spatial sound cues or passive haptics were added (low PI). Thirty-two participants were randomly assigned to one of four conditions: (1) high PI-high Psi, (2) high PI-low Psi, (3) low PI-high Psi, or (4) low PI-low Psi. Participants completed five stages in the VE: (1) participants played a Simon-like memory game to familiarize themselves with the VE, (2) practice room where participants dropped balls into containers with a target, (3) office-like room where participants dropped balls onto targets, (4) pit-like room where participants dropped balls onto targets on the ground floor, and (5) participants returned to the first room and played the Simon game again. Participants completed the Witmer--Singer Presence Questionnaire (Witmer & Singer, 1998) and a modified version of the Slater--Usoh--Steed Presence Score (Usoh, Catena, Arman, & Slater, 2000) post-experimentation. The results showed that several subscales of the Witmer--Singer Presence Questionnaire seemed to distinguish between immersion and coherence and responded to higher levels of immersion as a main effect. The Slater--Usoh--Steed Presence Score appeared to respond to increased immersion as a main effect, and the scores were higher in the matched (low PI--low Psi and high PI--high Psi) than in the unmatched conditions (low PI--high Psi and low Psi--high PI). Neither questionnaire appeared to respond to increased coherence as a main effect. However, the total score on both questionnaires increased in the condition with high PI--high Psi. These results are interesting and suggest that in future research perhaps different measures should be used to measure the place and plausibility illusions and additional factors related to immersion and coherence could be manipulated.
The meaning given to the place and the objects in the VE will affect the user's implicit (automatic, e.g., rotating one's physical shoulders to pass through narrow virtual doors, Lepecq, Bringoux, Pergandi, Coyle, & Mestre, 2009) and explicit behavior (conscious, e.g., confronting a feared stimulus, Emmelkamp et al., 2002). Two users who are immersed in the same VE could associate a different meaning to the place and the objects in the VE and therefore behave differently. For example, if a user who is suffering from acrophobia is exposed to a pit and feels that the scenario is actually occurring and that he or she is actually in the VE, that person will likely feel a high level of fear and avoid looking into it because it is perceived as a threat. On the other hand, a user who does not fear heights will likely be able to look into the pit—even if he or she feels the scenario is actually occurring and feeling actually in the VE—because the pit is not perceived as a threat. Avoiding collisions with virtual furniture could be another example of how two users could attribute a different meaning to a VE. Users who believe that they are in the VE and that the virtual events are actually occurring will likely implicitly avoid walking through virtual furniture because they believe that it is solid, and collisions with objects are normally avoided, as they would be in the physical environment. In contrast, users who experience remaining in the physical environment and that the virtual events are not actually occurring will likely walk through the virtual furniture because they believe that it is not actually there and that they cannot be harmed by a collision. If the plausibility and place illusions are sufficiently strong, users can derive meaning to the place and the objects in the VE, which will affect their behavior.
Empirical studies also appear to indicate that two users can give a different meaning to the same VE. In one study, Simon, Etienne, Bouchard, and Quertemont (2020) compared alcohol cravings of 18 heavy drinkers and 22 occasional drinkers following an immersion in a virtual bar with alcoholic beverages. The results revealed that the higher the perceived ecological validity (i.e., the feeling of believability, realism, and naturalness of the VE—this corresponds to the plausibility illusion), the higher the alcohol craving levels in heavy drinkers but not in occasional drinkers. The authors suggested that perceived ecological validity moderates the relationship between alcohol consumption in the physical environment and cravings following the immersion in the VE. These results suggest that the believability and perceived realism of the experience and of the VE led heavy drinkers to crave alcohol as they would in the physical environment. However, this was not the case in occasional drinkers since they do not attribute the same meaning to alcohol. In another study, Quintana, Nolet, Baus, and Bouchard (2019) found that the implicit perception of others’ scent affected participants’ perception and impression of an avatar in a virtual bar. The avatar did not speak or move and had a neutral facial expression. Specifically, the participants who were exposed to a fear-related scent felt more anxious, which led to reduced interpersonal trust toward the avatar compared with participants who were exposed to a joy-related scent (between-subjects design). These results show that participants attributed a different meaning to the avatar in the VE (i.e., he is threatening or not), which affected their anxiety levels and their level of interpersonal trust.
A user could attribute a different meaning to the same VE at different times. For instance, in a study conducted by Mühlberger, Sperber, Wieser, and Pauli (2008), 34 participants with arachnophobia completed a behavioral avoidance test in VR before and after eight exposure trials and after each trial. Participants completed questionnaires on their fear of spiders, their beliefs about spiders, trait anxiety, their subjective distress levels, and bodily symptoms experienced during the behavioral avoidance test and the exposure trials. The authors also measured participants’ heart rate (using an electrocardiogram [ECG]) and skin conductance levels. The results revealed that participants’ fear of spiders did not diminish, and their heart rate and skin conductance levels remained elevated following the exposure trials. However, participants approached the spider closer after the exposure trials than before the trials. Participants’ negative beliefs toward spiders also diminished after the exposure trials. The decrease in negative beliefs toward spiders could have enabled participants to approach the spider closer after the exposure trials. Even though the VE remained the same, participants’ behavior changed because the spider did not have the same meaning before and after the exposure trials. These results are concordant with those of Côté and Bouchard (2009) and Tardif, Therrien, and Bouchard (2019) who found that a reduction in dysfunctional beliefs after immersions in VR decreased participants’ fear of spiders. Participants also approached a live tarantula closer than they did pre-immersion.
22.214.171.124 Implicit and Explicit Behavior
Finally, VR is interactive (Steuer, 1992): a user's implicit and explicit behavior will further influence how the VE responds and which virtual and/or somatosensory stimuli (input) will be received and integrated by the brain (information processing), which will lead to subjective experience. For instance, if the users' task is to pick up a virtual box, they will physically move the controllers to move their virtual hands to pick it up. The VE will update their virtual hands’ locations according to their physical hand movements. The updated locations of their virtual hands will indicate whether they have successfully picked up the box or whether they need to try again. Given that visual and proprioceptive information (input) are necessary to complete this task, they will be sent back into the information processing loop and integrated. In short, the implicit and explicit behavior feedback the input, information processing, and subjective experience loop (see the model in Figure 1).
3 Future Research
Testing the model of presence proposed in Figure 1 could be done by targeting the role of implicit multisensory information on subjective presence. Multisensory integration and visual realism could be experimentally manipulated by adding slight non-explicit delays between stimuli (i.e., disrupting multisensory integration) and contrasting this effect on presence to manipulations of levels of visual realism (e.g., number of polygons, lightmaps, textures). If the model holds, it is expected that slight delays in multisensory integration will be detrimental to the believability of the VE, and thus to presence, and even more than explicit reduced visual realism. The impact of the delays should also be objectively measurable by objective behaviors. Moreover, the experimental manipulation should impact behavioral measures of presence. When considering implicit information processing and behaviors, it would be interesting to examine the relationship between presence (plausibility and place illusions), visual realism, and multisensory integration in the context of fear. Many VEs used in VR are relevant for the treatment of anxiety disorders and involve fear (Wiederhold & Bouchard, 2014), which would provide a specific context to apply the model. But most importantly, fear influences attention and information processing, with a focus on threat and safety cues, and implicitly mobilizes action toward seeking safety. In the context of fear, it may be easier to control for implicit and explicit information processing and observe behavior that informs researchers on presence.
Additional future research could study the impact of expectations on presence since it seems that previous experience can have an impact on the brain's expectations. The appearance of virtual characters and/or objects could be experimentally manipulated. If their appearance does not have an effect on the believability of the VE (i.e., they resemble and behave like people and objects found in the physical environment or from a previous immersion), it is likely that users will still implicitly react to the characters and/or the objects as if they were “real.” If their appearance has an effect on the believability of the VE (e.g., the human virtual characters have six legs and there is no logical explanation for why this is the case), users’ behavior in the VE will likely not resemble their behavior in the physical environment or from a previous immersion where the virtual character had two legs.
In addition, it could be interesting to study the plausibility and place illusions separately since they appear to be distinct concepts (i.e., feeling that the virtual scenario is actually occurring vs. feeling of “being there” in the VE). It is possible that some factors have a larger influence on the plausibility illusion while others have a larger influence on the place illusion. Perhaps perceived realism has a larger effect on the plausibility illusion than on the place illusion since it contributes to the believability of the VE. Users could still feel as if they “are there” in the VE but feel like the reduced perceived realism is negatively affecting their feeling that virtual events are actually occurring.
Sensory and emotional stimuli are integrated in a multisensory manner to create a coherent representation of the world (Chalmers et al., 2009; Stein et al., 2009; Stein et al., 2014; Welch & Warren, 1980). Multisensory conflicts can lead to various perceptual illusions (e.g., the ventriloquism effect, body illusions). Two important multisensory integration principles have emerged from research: (1) the temporal principle, and (2) the spatial principle.
The brain seems to process sensory and emotional information in a similar way in VR. Although there are interesting neuroscientific models that explain how various perceptual illusions are induced in VR (i.e., Gonzalez-Franco & Lanier, 2017) and how embodiment is related to presence in VR (Riva et al., 2018; Riva et al., 2019), these models do not relate specifically to presence and multisensory integration. This is why a new model is proposed using elements of the models put forward by Lombard and Ditton (1997), Gonzalez-Franco and Lanier (2017), Riva et al. (2018), and Riva et al. (2019).
Future research could experimentally manipulate expectations (e.g., added delays in the VE, the appearance of virtual characters and objects) and further explore the mechanisms of the plausibility and place illusions and different factors that impact these illusions. This research would help us to better understand the contribution of multisensory integration to presence.
Disclosure: Stéphane Bouchard is the President of, and owns equity in, Cliniques et Développement In Virtuo, a spin-off from the university that uses virtual reality and distributes virtual environments. The terms of these arrangements have been reviewed and approved by the Université du Québec en Outaouais in accordance with its conflict of interest policies. The remaining authors report no financial relationships with commercial interests. This article is part of the doctoral thesis from the first author.