Abstract
Collaborative virtual environments (CVEs), wherein people can virtually interact with each other via avatars, are becoming increasingly prominent. However, CVEs differ in type of avatar representation and level of behavioral realism afforded to users. The present investigation compared the effect of behavioral realism on users' nonverbal behavior, self-presence, social presence, and interpersonal attraction during a dyadic interaction. Fifty-one dyads (aged 18 to 26) embodied either a full-bodied avatar with mapped hands and inferred arm movements, an avatar consisting of only a floating head and mapped hands, or a static full-bodied avatar. Planned contrasts compared the effect of behavioral realism against no behavioral realism, and compared the effect of low versus high behavioral realism. Results show that participants who embodied the avatar with only a floating head and hands experienced greater social presence, self-presence, and interpersonal attraction than participants who embodied a full-bodied avatar with mapped hands. In contrast, there were no significant differences on these measures between participants in the two mapped-hands conditions and those who embodied a static avatar. Participants in the static-avatar condition rotated their own physical head and hands significantly less than participants in the other two conditions during the dyadic interaction. Additionally, side-to-side head movements were negatively correlated with interpersonal attraction regardless of condition. We discuss implications of the finding that behavioral realism influences nonverbal behavior and communication outcomes.
1 Introduction
The increased availability and consumer adoption of virtual reality (VR) headsets has led to the active development of social interaction platforms wherein people are able to use VR systems to communicate with each other inside virtual environments despite being in different geographical locations. For researchers and organizations creating these platforms, one of the key questions is how avatars (i.e., digital representations controlled by users in mediated environments [Fox et al., 2015]) should be designed to best represent users and maximize intended interpersonal outcomes.
Recent developments in social VR platforms include Sansar, VRChat, AltspaceVR, PlutoVR, High Fidelity, and Facebook's Spaces. While one of the main goals of these VR platforms is to offer a vivid social experience by enhancing feelings of environmental presence (the sense of “being there” [Steuer, 1992; Lombard & Ditton, 1997]), self-presence (the psychological state in which the virtual self is experienced as the actual self [Biocca, 1997]), and social presence (the sense of “being there with someone else” [Biocca, Harms, & Burgoon, 2003]), they differ markedly in terms of avatar representation. Most notably, not all platforms afford full body representation. For example, Facebook Spaces allows for only head, torso, and arm representation while VRChat and High Fidelity allow for full-body representation.
Another key difference is the level of behavioral realism avatars exhibit. Behavioral realism is defined as the degree to which human avatars are able to exhibit naturalistic behaviors in real time the way humans are expected to in the physical world (Blascovich et al., 2002; Swinth & Blascovich, 2002). For example, when walking down the street in the real world, there is an expectation that the people we pass by will behave as expected (i.e., they will walk past us by putting one foot in front of the other while subtly swaying their arms). However, if instead of seeing someone walk past us, we see someone violently flailing their arms while jumping up and down past us, social and behavioral expectations are violated and result in negative interactions. The same expectation of behavioral realism occurs with virtual humans (Blascovich, 2002). An avatar that looks human, is expected to behave as a human, and exhibit behavioral realism when it is perceived to be moving, walking, and gesturing as a real human would in the physical world. Depending on the VR system or the social VR platform, users' physical movements may or may not be mapped onto their avatars in real time, either hindering or enhancing behavioral realism. For example, some platforms, like the ones described above, let the user's physical head and hand movements control the head and hand movements of the avatar in real time by using tracking systems. However, other platforms do not provide hand tracking (i.e., user's physical hands do not control virtual hands), thus depriving users of the ability to gesture appropriately and reducing behavioral realism. These users are still able to press buttons on the keyboard or use their mouse to select preprogrammed gestures, but are not be able to naturally gesture the way they would in a face-to-face (FtF) interaction.
This definition of behavioral realism is not limited to the avatars the user interacts with in a virtual environment, but encompasses the self and others, since the avatar a user embodies and the avatars of the people that the user interacts with can both exhibit different levels of behavioral realism depending on the VR system they are using.
Past research has demonstrated the importance of nonverbal behavior in establishing and maintaining relationships (Argyle & Williams, 1969), developing trust (Burgoon, Birk, & Pfau, 1990), and on initial judgments about others (Argyle, Alkema, & Gilmour, 1971; Riggio & Friedman, 1986; Schneider, Hastorf, & Ellsworth, 1979). Yet, the same nonverbal cues that convey social meaning (Burgoon, Guerrero, & Manusov, 2011), allow people to convey emotion (Argyle 1988; McMahan, 1976) and build rapport (DiMatteo, Taranta, Friedman, & Prince, 1980; Gratch, Kang, & Wang, 2013) are often lost through computer-mediated communication (CMC). This is not to suggest that people cannot convey social meaning or emotion through CMC. In fact, past research has demonstrated that people are still able to build rapport and convey meaning through CMC, although the process may take longer (Walther, 1992, 1996). However, social VR platforms, also known as collaborative virtual environments (CVEs), are the first CMC medium that can take advantage of both verbal and nonverbal communication channels that were only previously available in FtF communication (Roth, Latoschik, Vogeley, & Bente, 2015).
Given the wide range of avatar representations and VR systems that are available as well as the increased use of CVEs, researchers have started to examine the influence of different types of avatar representation and customization on subjective measures of user experience (Heidicker, Langbehn, & Steinicke, 2017; Zibrek, Kokkinara, & McDonnell, 2018). However, very little is known about the effects that different levels of behavioral realism have on nonverbal behavior in real-time CVE interactions, and to our knowledge, no study has examined the effect of behavioral realism on the particular measure of interpersonal attraction. In addition, while the advancement of VR tracking systems has greatly facilitated the quantification of nonverbal behavior, only a handful of researchers have studied whether nonverbal behavior in CVE interactions correlate with communication outcomes. Thus, the current investigation focuses on (1) comparing the effect of embodying different types of avatar representations that afford varying levels of behavioral realism on interpersonal attraction, self-presence, and social presence, and (2) exploring the relationship between participants' nonverbal behavior and interpersonal attraction.
1.1 Nonverbal Communication and Social Virtual Interactions
Past research has demonstrated that nonverbal communication is a key component in interpersonal communication in that it can complement, supplement, and even replace verbal communication (Ekman & Friesen, 1979). Nonverbal behavior is also closely tied to emotions (Riggio & Friedman, 1986), can reveal psychological states (Won, Perone, Friend, & Bailenson, 2016), and reflect one's evaluations of people, objects, and situations (Schneider, Hastorf, & Ellsworth, 1979).
Gestures in particular have been found to have an effect on the perception and evaluation of social experiences and interaction partners in FtF settings and in virtual environments (Huang, Morency, & Gratch, 2011; Krämer, Tietz, & Bente, 2003; Slater & Steed, 2002). For example, when participants were presented with both verbal and nonverbal cues in the form of hand and facial gestures, more importance was given to the nonverbal cues than to what was actually being said by the participant's interaction partner (McMahan, 1976). In a separate study, Slater and Steed (2002) showed that participants who received negative nonverbal feedback from a virtual audience experienced higher levels of anxiety than participants who received neutral or positive nonverbal feedback from the same virtual audience. Past research has also shown that even small amounts of eye contact and random movements from avatars (i.e., digital representations controlled by users) or agents (i.e., digital representations of algorithms) are evaluated more positively than static avatars (Heylen et al., 2002) and can foster strong feelings of rapport (Gratch, Kang, & Wang, 2013; Huang, Morency, Gratch, 2011). These results suggest that the behavioral realism exhibited by avatars or agents that participants interact with can affect the way users evaluate their interaction partners and develop feelings of rapport. However, little is known about the effect that embodying an avatar that exhibits behavioral realism has on interpersonal outcomes during social interactions.
In general, when it comes CVEs, gestures can be performed normally only if the technology allows users to map their gestures onto their avatars in real time; as such, it is possible that allowing or restricting body movements in VR will enhance or limit the behavioral realism exhibited by the user's avatar or the avatars the user is interacting with, and thus, affect interpersonal outcomes and the virtual experience as a whole.
1.2 Behavioral Realism and Avatars
In avatar-mediated communication, the level of behavioral realism (i.e., extent to which avatars exhibit naturalistic behaviors in real time) is dependent on the amount of control that a user has of his or her avatar movements. A full-bodied avatar that does not have any mapped movements is less behaviorally realistic than an avatar that has mapped hands because an avatar that cannot be controlled by its user cannot gesture or behave as its user does. As mentioned above, some platforms allow for full-body representations. These platforms use the real-time tracking data of specific body parts (e.g., head and hands) to estimate the position of other joints. As a result, there is always a risk of error and unnatural body movements of joints that are not being actively tracked. Other virtual platforms only allow users to see the parts of their avatar that they can control (e.g., avatars that consist of only a head, torso, and arms for VR systems that only track the head and hands). This incomplete, yet fully controlled avatar representation may change the way behavioral realism is perceived by users because the body parts that remain static or that could behave unrealistically due to calculation errors are not rendered. Thus, the entire representation users embody or interact with behaves naturalistically. The rendering of an incomplete body that is controlled in its entirety could allow users to focus on the social interaction rather than the technical and behavioral flaws that occur when nontracked body parts are rendered, and potentially improve interpersonal outcomes and increase self and social presence.
Additionally, just as there is an expectation of behavioral realism in the real world, there is an expectation of behavioral realism when interacting with human-like avatars inside virtual environments (Blascovich, 2002). The more photorealistic an avatar appears to the user (i.e., the more human-like the avatar looks), regardless of whether it is the user's avatar or the avatar he or she is interacting with, the higher the expectation of behavioral realism (Slater & Steed, 2002; Garau et al., 2003). The violation of this expectation can lead to negative effects in interpersonal outcomes. For example, past research has demonstrated that when there is a mismatch between avatar photorealism and behavioral realism, social presence is reduced for participants and their interaction partners (Bailenson et al., 2005; Garau et al., 2003). Past research also shows that static avatars (i.e., avatars that cannot be fully controlled by their users and therefore exhibit no behavioral realism) are perceived as unwilling to express emotion by participants, and therefore are unsuccessful when they are used to communicate with others in mediated environments (Vilhjálmsson & Cassel, 1998).
Past research has also demonstrated that when participants embody avatars that exhibit behavioral realism (operationalized by mapped head and arm movements), the illusion of body ownership increases (Sanchez-Vives et al., 2010). These results suggest that apart from increasing social presence and allowing users to convey emotion or information through nonverbal behavior, the existence of visuomotor synchrony (i.e., when users are able to see that their avatar is moving synchronously with their own body in real time) heightens the illusion of body ownership, and therefore, increases self-presence (Peck, Seinfeld, Aglioti, & Slater, 2013). These studies provide evidence showing that when avatars exhibit behavioral realism through visuomotor synchrony, self-presence increases.
In a recent dyad study, Heidicker and colleagues (2017) examined the effect of three different types of avatar representation on environmental presence and social presence. Results showed that dyads in which both participants were able to control their avatar's hands and were represented by a full-bodied avatar felt more present in the virtual environment than participants who embodied an avatar with predefined animations or participants who were only represented with their avatar's head and hands. However, there was no significant difference on social presence scores between these conditions (Heidicker et al., 2017).
In a similar study, Dodds, Mohler, and Bülthoff (2011) had dyads embody an avatar that either allowed them to gesture with their hands or whose hands remained static during a task where participants had to explain different words to each other. Results showed that dyads who were able to control their avatar's hands during the task used significantly more gestures and were significantly more successful at the task than dyads that were not able to gesture at all. These results suggest that the level of behavioral realism afforded to the participants via their avatar may have an effect on their overall nonverbal behavior and feelings of environmental presence but not necessarily social presence.
1.3 Model of Social Influence
An immersive virtual environment (IVE) is a fully immersive and interactive computer-generated environment that gives the user the feeling of being somewhere other than where they are in the physical world. VR systems block out the perceptual input from the real world and replace it with perceptual input from a virtual environment that surrounds the user, is fully responsive to the user's actions, and elicits feelings of presence. CVEs are IVEs where two or more users in remote physical locations can interact with each other inside the same virtual environment as if they were in the same physical location.
In virtual social interactions, Blascovich and colleagues' theoretical model of social influence (Blacovich, 2002; Blascovich et al., 2002, Blascovich & Bailenson, 2011) posits that the degree of behavioral realism, and the extent to which the user believes he or she is interacting with a sentient being in IVEs or CVEs, moderates the effects virtual humans have on users. When participants interacted with avatars that were high in behavioral realism, they reported higher feelings of social presence (Guadagno et al., 2007). In a different study, participants who interacted with virtual agents that exhibited a high degree of behavioral realism and mirrored participants' head movements with a four-second delay were rated more positively and were more influential than virtual agents who did not display realistic behaviors within the context of the interaction (Bailenson & Yee, 2005).
Directly germane to this conjecture, Steed and colleagues (2016) demonstrated that participants performed better on a memory task inside an IVE when they embodied an avatar whose movements mirrored their own compared with when they were not allowed to move their hands, or when they did not have a virtual representation at all. This study also showed that when participants did not have a virtual representation, they did not move their hands as much as the participants in the other two conditions, suggesting that physical movements may be influenced by the type of avatar representation afforded.
A different study showed that embodying a full-bodied avatar, when compared to floating limbs or no avatar representation, increased virtual body ownership and self-presence (Petkova et al., 2011). In addition, Usoh and colleagues (1999) demonstrated that users felt more comfortable whenever they were able to control their avatar's movements with their own physical movements in real time than when they had no control. These studies specifically looked at what happens when the user's own avatar is manipulated, and offer cogent evidence that the ability to control the physical movements of one's virtual representation (i.e., high degree of behavioral realism) affects physical behavior, interpersonal attitudes, and cognition (Yee, Bailenson, & Ducheneaut, 2009).
2 Present Study
Past research has shown that embodying a full-bodied self-avatar with mapped hands can increase task performance (Dodds et al., 2011; Pan & Steed, 2017; Steed et al., 2016), and environmental presence (Heidicker et al., 2017; Petkova et al., 2011) when compared to no self-avatar or static avatars. When it comes to social presence, findings are mixed. No significant differences between full-bodied mapped avatars, programmed animated avatars, or floating hands were found by Heidicker and colleagues (2017). However, Guadagno and colleagues (2007), as well as Garau et al. (2003), have demonstrated that avatars exhibiting higher behavioral realism lead to higher feelings of social presence in dyadic studies. These studies differ in that Guadagno and colleagues (2007) had participants interact with agents rather than avatars as was the case with Garau et al. (2003) and Heidicker et al. (2017). Additionally, these studies differ methodologically in that Garau et al. (2003) operationalized behavioral realism through gaze and Heidicker et al. (2017) operationalized behavioral realism through a participant's head and hand movements.
Overall, these findings suggest that avatar representations inside virtual environments affect both individual and interpersonal outcomes. However, it is important to note that participants did not interact with other avatars in most of these studies, and that even when participants were afforded full-body representation, they were not always able to control their avatar's hands with their own physical hands. Additionally, even when participants were able to interact with other avatars, the effect of avatar representation or behavioral realism on interpersonal attraction (i.e., the extent to which participants liked and got along with their partners based on the rapport built during their interaction rather than physical attributes [Davis & Perkowitz, 1979]) has not been assessed. Thus, the present study examined the effect of behavioral realism on nonverbal behavior, self-presence, social presence, and interpersonal attraction in a dyadic interaction where two unacquainted participants interacted with each other for the first time inside a CVE.
Different types of avatar representation can influence physical behavior, perceptions of the self and others, and subjective ratings of social interactions inside virtual environments (Bailenson & Yee, 2005; Bailey, Bailenson, & Cassanto, 2016; Krämer et al., 2003; Slater & Steed, 2002; Yee et al., 2009). Based on these findings and Blascovich's model of social influence, we predict that avatars that exhibit any level of behavioral realism will lead to higher levels of social presence, self-presence, and interpersonal attraction than avatars that do not behave realistically. We also predict that participants who are able to control the entirety of their avatar, thus exhibiting higher behavioral realism, will report higher self-presence, social presence, and interpersonal attraction than those who are not able to control their avatars in their entirety (i.e., lower behavioral realism).
Past research also suggests that preventing a user from being able to naturally gesture through their avatar during a dyadic interaction significantly reduces the number of gestures produced in the real world during the interaction (Dodds et al., 2011; Steed et al., 2016). In other words, if an avatar is not able to show the physical gestures of the person controlling it, users tend to stop producing those gestures while they're embodying that avatar since no one in the virtual environment is able to perceive them. We aim to replicate this finding, anticipating that participants with static avatars will move their hands significantly less than participants who are able to control their avatar's hands with their own hand movements. To our knowledge, this will be the first study to examine the effect of behavioral realism on interpersonal attraction scores.
3 Method
3.1 Participants
A total of 102 participants (36 men, 66 women) were recruited from a medium-sized western university. Of these, 51 (50%) were White, 25 (24.51%) were Asian, 8 (7.84%) were Hispanic, 9 (8.82%) were Black, and 9 (8.82%) reported another ethnicity. All participants were assigned to same-sex dyads (in order to account for potential sex effects) for a total of 51 dyads. Participants were young adults ages 18 to 26 years old with a mean of 20.53 (SD = 1.72).
3.2 Materials and Apparatus
Participants who had never met in the real world interacted with each other for the first time inside a CVE using a head-mounted display (HMD), a pair of hand controllers, and headphones. The HMD participants wore had a resolution of 1080 × 1200 and an update rate of 90 frames per second for each eye. The HMD used was an HTC Vive. An optical tracking system (Valve Lighthouse, update rate of 120 Hz) and three 6-degree of freedom sensors (HTC Vive HMD and hand controllers) were used to track the participant's physical head and hand rotation (pitch, yaw, and roll). Figure 1 shows the room setup.
The male and female avatars used in this study were not assessed quantitatively in terms of their physical attractiveness. However, the male avatar was created from the same avatar mesh as the female avatar and superimposed onto a male body, ensuring a similar level of physical attraction. The social VR platform used to render and program the CVE was High Fidelity.
3.3 Design
This study adopted a between-dyad design with three conditions, namely, Static Hands, Inferred Arms, and Floating Hands. Both participants in the dyad were assigned the same type of avatar representation within each condition. To reflect the current landscape of behavioral realism available in social VR platforms, three different types of avatar representations were created. In this study, behavioral realism is operationalized by the presence or absence of hand visuomotor synchrony.
In the first condition, Static Hands, participants embodied a full-bodied avatar with no hand or feet tracking (i.e., participants' physical hands and feet movements were not mapped onto their avatar). In the second condition, Inferred Arms, participants were represented by a full-bodied avatar with head and hand tracking, as well as inferred arm movements via a standard inverse kinematics algorithm but no feet tracking. In the third and last condition, Floating Hands, participants embodied an avatar only showing a floating head and floating hands with both head and hand tracking. These avatar representations were created in order to more accurately assess the effect of having mapped hands during a virtual social interaction and to examine the role of behavioral realism on physical nonverbal behavior, social presence, self-presence, and interpersonal attraction. Figure 2 shows the full-bodied avatar used for the Inferred Arms and Static Hands conditions as well as the floating head and hands for the Floating Hands condition. The avatars used do not differ in terms of photographic realism.
For the Inferred Arms condition, the standard inverse kinematics algorithm embedded within the High Fidelity platform was used to infer the angle of the elbow and shoulder joints based on the hand tracking. In terms of behavioral realism, the only difference between the Inferred Arms and Static Hands conditions was that participants in the Inferred Arms condition were able to control the hands and arms of their avatar whereas the avatar's hands of the participants in the Static Hands condition remained static regardless of how participants moved their physical hands.
The Inferred Arms condition and the Floating Hands condition were equally limited in their ability to control their avatar—only the head and hand movements of the participants were mapped onto their respective avatars. However, participants in the Inferred Arms condition were able to see that they and their partners were not able to control their full body, whereas participants in the Floating Hands condition were able to control their rendered head and hands and see that their partner could also control their own floating head and hands in real time. Given that inverse kinematics algorithms still have limited accuracy and can produce anomalies which do not follow the rules of human anatomy, and that participants were not able to fully control their avatars, we designated Inferred Arms condition as lower in behavioral realism compared to the Floating Hands condition. Thus, the Static Hands condition has no behavioral realism, the Inferred Arms condition has low behavioral realism, and the Floating Hands condition exhibits high behavioral realism.
These avatar representations allow us to compare the effects of no behavioral realism (Static Hands) versus behavioral realism (Inferred Arms and Floating Hands) as well as low and high levels of behavioral realism (Inferred Arms versus Floating Hands) effects on nonverbal behavior and interpersonal outcomes. Additionally, this study design allows us to examine the relationship between avatar representation and behavioral realism. In other words, by comparing the Inferred Arms and Floating Hands conditions, which allow users to have the same level of control over their avatars but differ markedly in terms of avatar representation, we will be able to examine the effect of being able to control all of the rendered parts of your avatar against only being able to control some aspects of it as is the case with most commercial VR systems that are currently available.
3.4 Procedure
Participants were asked to come to different locations to prevent them from interacting with their partner prior to the virtual interaction. After participants signed a consent form, they were randomly selected into one of the three conditions and were led into separate study rooms. Participants met for the first time inside the virtual environment. Participants who knew each other prior to the virtual interaction were excluded from the analysis.
Prior to entering the virtual environment, a researcher told participants that they would play the 20 Questions Game with a partner (Bailenson, Beall, & Blascovich, 2002; Bailenson & Yee, 2006; Krämer, Simons, & Kopp, 2007). Participants were told that their partner was a real person in a different physical location, and that they would see their partner's avatar once they were inside the virtual environment. The rules of the 20 Questions Game were explained to all of the participants regardless of whether or not they were familiar with the game.
During the 20 Questions Game, participants took turns asking “yes” or “no” questions to their partner, trying to guess the word on a card that their partner picked out of a set of five cards. Participants were each assigned one of the following words: “apple” or “boat” regardless of which card they picked. The game had two rounds; during the first round one partner would ask the questions and during the second round, the other participant would ask the questions. Each round ended when the partner asking the questions guessed the word correctly, or when he or she asked 20 questions without being able to guess their partner's word. Inside the CVE, a counter let participants know how many questions they had left before the end of each round. After receiving these instructions, participants put on the HMD and were given the hand controllers in their respective rooms. On average, the dyadic interactions lasted 13.33 minutes (SD = 2.92) and ranged between 10 and 18 minutes.
What the CVE participants entered resembled an empty living room inside an apartment (see Figure 3). A gray wall in the virtual environment separated participants, preventing them from seeing each other until it was time to play the 20 Questions Game. Participants were able to see the avatar body they were assigned to by looking down and were able to walk around the CVE if they wanted to. Participants were also able to see their partner's body, which matched their own. However, there were no mirrors in the virtual environment so participants were not able to see their avatar's face, and therefore, were not able to tell that they were embodying the same avatar as their interaction partner. Participants were asked to point to different areas in the virtual environment. Participants in the two mapped hands conditions (Inferred Arms and Floating Hands) were able to see that the movement of their avatar hands mirrored those of their physical hands. The virtual hands and arms of the participants in the Static Hands condition did not move regardless of the movement of the participant's physical hands. After this exercise, the experimenter put headphones on the participant, and the virtual gray wall was removed to allow the participants to see each other. Once both participants confirmed that they could see and hear each other, participants began the 20 Questions Game. Participants communicated with each other using the built-in microphone in the HTC Vive HMD and a pair of headphones. After the game, participants were asked to spend five minutes getting to know their partner as best as they could. Past research has demonstrated that most mediated and FtF interactions consist of both task-oriented and socioemotional exchanges (Peña & Hancock, 2006; Rice & Love, 1987; Walther, Anderson, & Park, 1994). We included both the task-oriented 20 Questions Game as well as the socioemotional “getting to know you task” to enhance the external validity of our dyadic interaction and provide a more natural interaction between participants.
When the five minutes were up, the virtual wall appeared automatically and the participants were no longer able to see each other. Researchers made sure that participants were not able to see each other after the experimental manipulation. After the interaction, participants were led into different rooms where they completed a questionnaire and were debriefed by the experimenter.
3.5 Measures
3.5.1 Manipulation Checks
Two manipulation checks were employed in order to assess whether or not participants were aware of their own avatar's level of behavioral realism as well as their partner's. First, all participants were asked to report whether or not their partner could control his or her avatar's hands. Then, they were asked to report whether they could control their own avatar's hands. A failure consisted of giving the wrong response to either question.
3.5.2 Nonverbal Behavior
To determine if level of behavioral realism influenced nonverbal behavior, the rotation (pitch, yaw, and roll) of the participant's head and hands were tracked 10 times per second. Pitch represents the participant's head and hand movements around the X-axis (e.g., nodding “yes”). Yaw represents the participant's head and hand movements around the Y-axis (e.g., shaking “no”), and roll represents the participant's head and hand tilting around the Z-axis (e.g., touching the left ear to the left shoulder or the right ear to the right shoulder). Figure 4 demonstrates these head rotations.
3.5.2.1 Head Rotation
Following Won and colleagues (Won, Bailenson, and Janssen, 2014; Won et al., 2016), the extent to which the participant moved his/her head was calculated by averaging the standard deviations of the rotational data for the duration of the interaction. For example, the average standard deviation of yaw reflects the tendency of a participant to rotate his or her head from left to right inside the virtual environment. Total head rotation was calculated by averaging the pitch, yaw, and roll standard deviations from the HMD. The value obtained from this calculation represents the extent to which participants rotated their heads during the interaction. Higher values represent more head rotations during the interaction with their partner.
3.5.2.2 Hand Rotation
Similar to head rotation, total hand rotation was calculated by averaging the pitch, yaw, and roll standard deviations from both controllers. The value obtained from this calculation represents the extent to which participants rotated their hands during the interaction. Higher values represent more hand rotations during the interaction with their partner.
3.5.3 Interpersonal Attraction
Interpersonal attraction was measured with six items used in Davis and Perkowitz (1979). Sample items include “My partner is the type of person I could become close friends with” and “My partner is friendly.” Participants were asked (see Appendix) to rate how strongly they agreed with each statement on a 7-point Likert-type scale (1 = Strongly Disagree, 7 = Strongly Agree). The reliability of the scale was good, Cronbach's alpha = .85 ( 5.79, SD = .59).
3.5.4 Social Presence
Social presence was measured by adapting items from the Networked Minds Social Presence Inventory (Biocca & Harms, 2004) and Nowak and Biocca's (2003) social presence measure. Participants were asked (see Appendix) to rate how strongly they agreed with five statements on a 7-point Likert-type scale (1 = Strongly Disagree, 7 = Strongly Agree). Sample items include “I felt like I was in the same room as my partner” and “I felt like my partner was aware of my presence.” The adapted items were changed only to reflect the interaction context. The reliability of the scale was good, Cronbach's alpha = .76 (, SD = .86).
3.5.5 Self-Presence
Self-presence was measured by slightly modifying five items from Aymerich-Franch, Karutz, and Bailenson (2012). Participants were asked (see Appendix) to rate how strongly they agreed with each statement on a 7-point Likert-type scale (1 = Strongly Disagree, 7 = Strongly Agree). Sample items include “I felt like my avatar's body was my own body” and “I felt like my avatar was me.” The reliability of the scale was good, Cronbach's alpha = .87 (, SD = 1.16).
4 Results
Dyadic data violate the assumption of independence that is necessary in order to perform analysis of variance (ANOVA) or linear regression analysis (McMahon, Pouget, & Tortu, 2006; Srivastava et al., 2006). Because of this, a multilevel analysis (also known as a linear mixed model analysis) was used to account for the fixed effects of avatar representation and the random effects of individual interaction partners nested within each dyad. Two planned, orthogonal contrasts specifically tested our hypotheses (Rosenthal, Rosnow, & Rubin, 2000). We predicted (1) that being able to control your avatar's hands in real time would lead to higher self-presence, social presence, and interpersonal attraction scores than having a static avatar (i.e., Inferred Arms and Floating Hands versus Static Hands) and (2) that higher behavioral realism exhibited by avatars that are fully controlled by participants, even if it means not having a full-body representation, will lead to higher self-presence, social presence, and interpersonal attraction scores than avatars that have a full body that cannot be controlled in its entirety by participants (i.e., Floating hands versus Inferred Arms). Gender was used as a covariate to account for any possible gender effects since past research demonstrated that women and men differ in their ability to express and understand nonverbal behavior, and behave differently when interacting with virtual humans (Bailenson et al., 2001; Bailenson et al., 2004; Bailenson et al., 2005). For this analysis we also included a compound symmetry structure to account for within-subject correlated errors. All dyads were treated as indistinguishable given that there were no specific traits differentiating one participant from the other within each dyad. All analyses were carried out in R version 3.0.2 using the lme4 and nlme packages. The means and standard deviations of the outcome variables (i.e., interpersonal attraction, social presence, self-presence, and the average standard deviations for the head and hand rotation) by condition are summarized in Table 1.
Measures . | Inferred Arms M (SD) . | Static Hands M (SD) . | Floating Hands M (SD) . |
---|---|---|---|
Interpersonal Attraction | 5.57 (.99) | 5.56 (.79) | 5.89 (.76) |
Social Presence | 5.67 (.63) | 5.76 (.52) | 6.01 (.59) |
Self-Presence | 3.68 (1.29) | 3.81 (1.13) | 4.27 (.89) |
Hand Rotation | 365.51 (77.31) | 275.1 (99.5) | 367.63 (84.25) |
Head Rotation | 65.1 (17.5) | 56.3 (16.7) | 67.33 (18.21) |
SD yaw | 17.86 | 13.35 | 15.78 |
SD roll | 87.94 | 78.03 | 94.61 |
SD pitch | 89.05 | 77.56 | 91.60 |
Measures . | Inferred Arms M (SD) . | Static Hands M (SD) . | Floating Hands M (SD) . |
---|---|---|---|
Interpersonal Attraction | 5.57 (.99) | 5.56 (.79) | 5.89 (.76) |
Social Presence | 5.67 (.63) | 5.76 (.52) | 6.01 (.59) |
Self-Presence | 3.68 (1.29) | 3.81 (1.13) | 4.27 (.89) |
Hand Rotation | 365.51 (77.31) | 275.1 (99.5) | 367.63 (84.25) |
Head Rotation | 65.1 (17.5) | 56.3 (16.7) | 67.33 (18.21) |
SD yaw | 17.86 | 13.35 | 15.78 |
SD roll | 87.94 | 78.03 | 94.61 |
SD pitch | 89.05 | 77.56 | 91.60 |
Note. = Mean; SD = Standard Deviation.
4.1 Manipulation Checks
The manipulation checks revealed that there were 3 participants who were not aware of what type of avatar representation they had, and 9 participants who were not aware of what type of avatar representation their partners had. Given the goal of this study was to examine the effects of different types of avatar representation on interpersonal outcomes, self-presence, and social presence, the participants who were not aware of either their own or their partner's avatar representation were excluded from the analysis. The 11 participants who failed the manipulation check were excluded from the analysis along with their partners. Additionally, 2 dyads were excluded due to technical errors that occurred during data collection, leaving a total of 76 participants and 38 dyads.
4.2 Nonverbal Behavior
The average standard deviations of the participant's rotational (pitch, yaw, and roll) data were used in this analysis.
4.2.1 Head Rotation
There was a significant effect of condition on total head rotation. Participants in the Floating Hands and Inferred Arms conditions rotated their head significantly more than participants in the Static Hands condition . However there was no significant difference between the Inferred Arms and Floating Hands conditions . These results show that participants who are able to control their avatar's hands with their physical hand movement rotated their head significantly more than those with static avatars. However, whether participants had a full-body representation or just floating head and hands did not make a difference in total head rotation throughout the dyadic interaction.
4.2.2 Hand Rotation
There was a significant effect of condition on hand rotation. Participants in the two mapped hands conditions, Inferred Arms and Floating Hands, rotated their hands significantly more than participants who embodied static avatars . Even though participants in the Floating Hands condition rotated their hands more than participants in the Inferred Arms condition according to the descriptive data, this difference was not statistically significant . These results show that participants who are able to control their avatar's hands with their own physical hands rotate their hands a lot more than participants who see that their avatar does not mimic their physical movements. However, whether or not the participants embodied a full-bodied avatar or not did not influence total hand rotation.
4.3 Interpersonal Attraction
Participants in the Floating Hands condition reported higher levels of interpersonal attraction than participants in the Inferred Arms condition . In contrast, there was no significant difference in interpersonal attraction scores between the behavioral realism conditions (Inferred Arms and Floating Hands) and the Static Hands condition . Taken together, these results show that interpersonal attraction scores were not significantly affected by whether or not behavioral realism was exhibited. Yet, when participants are able to control their avatar's hands, those who are able to control their avatar in its entirety, report liking their partners significantly more than participants who are unable to fully control their avatar representations and have algorithmically inferred body movements.
4.4 Social Presence
There was a marginally significant difference between participants in the Floating Hands and Inferred Arms conditions with participants in the Floating Hands condition reporting higher levels of social presence than participants in the Inferred Arms condition . Once again, there was no significant difference in social presence scores between the mapped hands conditions and the Static Hands condition . Additionally, social presence was a marginally significant predictor of interpersonal attraction , with higher social presence scores leading to higher interpersonal attraction scores.
4.5 Self-Presence
The results of the self-presence scores follow the same pattern as the social presence and interpersonal attraction measures. Participants in the Floating Hands condition reported higher levels of self-presence than participants in the Inferred Arms condition . However, there was no significant difference in self-presence scores between the mapped hands conditions and the Static Hands condition .
4.6 Correlations among Dependent Variables
All simple correlations are included in Table 2. The correlation analysis showed that social presence and interpersonal attraction were significantly and positively correlated indicating that participants who felt like they were with their partners inside the CVE liked their partners more than those who did not regardless of condition . Self-presence and social presence were positively correlated such that participants who reported feeling like their avatar body was their own also reported feeling like their partner was inside the CVE with them . Average standard deviation of yaw was significantly and negatively correlated with interpersonal attraction indicating that participants liked partners who did not constantly rotate their heads from side to side more than those who did . Figure 5 shows the scatterplot of the correlation between standard deviation of yaw and interpersonal attraction. The standard deviation of pitch was significantly and positively correlated to the standard deviation of roll and total head rotation ( and , respectively).
. | Social Presence . | Interpersonal Attraction . | Self-Presence . | SD yaw . | SD pitch . | SD roll . | Total Hand Rotation . | Total Hand Rotation . |
---|---|---|---|---|---|---|---|---|
Social Presence | — | 0.28** | 0.48**** | 0.04 | 0.00 | 0.06 | 0.09 | 0.04 |
Interpersonal Attraction | — | 0.14 | −0.33** | 0.05 | −0.02 | 0.08 | −0.07 | |
Self-Presence | — | 0.08 | 0.01 | 0.02 | 0.09 | 0.03 | ||
SD yaw | — | −0.05 | 0.19 | 0.18 | 0.31*** | |||
SD pitch | — | 0.91**** | 0.28** | 0.92**** | ||||
SD roll | 0.29** | 0.98**** | ||||||
Total Hand Rotation | — | 0.32*** | ||||||
Total Head Rotation | — |
. | Social Presence . | Interpersonal Attraction . | Self-Presence . | SD yaw . | SD pitch . | SD roll . | Total Hand Rotation . | Total Hand Rotation . |
---|---|---|---|---|---|---|---|---|
Social Presence | — | 0.28** | 0.48**** | 0.04 | 0.00 | 0.06 | 0.09 | 0.04 |
Interpersonal Attraction | — | 0.14 | −0.33** | 0.05 | −0.02 | 0.08 | −0.07 | |
Self-Presence | — | 0.08 | 0.01 | 0.02 | 0.09 | 0.03 | ||
SD yaw | — | −0.05 | 0.19 | 0.18 | 0.31*** | |||
SD pitch | — | 0.91**** | 0.28** | 0.92**** | ||||
SD roll | 0.29** | 0.98**** | ||||||
Total Hand Rotation | — | 0.32*** | ||||||
Total Head Rotation | — |
Note. **, ***, ****; SD = Standard Deviation.
5 Discussion
The present study examined the effect of behavioral realism on nonverbal behavior, social presence, self-presence, and interpersonal attraction during a dyadic interaction inside a CVE. Participants met each other for the first time inside the virtual environment, played a game, and spent five minutes getting to know each other. The interaction consisted of both task-oriented and socioemotional exchanges between the participants in each dyad in order to most accurately represent a FtF social interaction (Peña & Hancock, 2006; Rice & Love, 1987; Walther, Anderson, & Park, 1994).
Results showed that the level of behavioral realism afforded to participants influenced their physical nonverbal behavior in the form of total head and hand rotations. Replicating past research (Dodds et al., 2011; Steed et al., 2016) and in accordance with our predictions, participants whose avatar's hands reflected their physical movements and saw that their partners were also able to control their own hands rotated their own hands and heads significantly more than dyads that embodied static avatars. Even though most of us are not aware of our nonverbal behavior (Anderson, 2003), these results show that when participants were assigned to an avatar representation that hindered their ability to communicate nonverbally (i.e., participants' actual physical behavior was not represented by their avatars in real time as was the case for the Static Hands condition), they displayed less nonverbal behavior in the physical world during the dyadic interaction. A possible explanation for these results is that participants realized that their nonverbal behavior was not being perceived by their interaction partner, and thus, reduced the number of nonverbal behaviors produced during the interaction. It is also likely that participants who saw their interaction partners move less, as a result of their avatar representation, also moved less.
There was no significant difference between the behavioral realism conditions and the no behavioral realism condition (Inferred Arms and Floating Hands versus Static Hands) on social presence, self-presence, and interpersonal attraction scores. These findings suggest that although embodying a full-bodied self-avatar with mapped hands can lead to increased task performance inside IVEs, the same effects may not be transferable to CVEs. More specifically, past research has demonstrated that embodying a full-bodied avatar with mapped hand movements inside IVEs leads to increased body ownership and self-presence (Petkova et al., 2011) as well as improved task performance (Steed et al., 2016). In contrast, our results suggest that the advantages of having a full-bodied avatar with mapped hand movements within a single-user IVE may not be applicable in a virtual environment with multiple interactants (i.e., CVEs).
Within the behavioral realism conditions, members of dyads who were represented only by a floating head and hands reported feeling higher interpersonal attraction and self-presence than dyads in the Inferred Arms condition. This result suggests that full-body representation where both users in a dyad are not able to control their avatar in its entirety (i.e., users embody avatars low in behavioral realism), may be detrimental to interpersonal outcomes when compared to a different type of avatar representation that allows the user to control every body part that is rendered. This finding is in accordance with our predictions and is supported by Blascovich's model of social influence (2002). It is also possible that participants in the Inferred Arms condition, who embodied an avatar that represented their full body, had a higher expectation of behavioral realism for both their avatar and their partner's avatar that was not met, and thus, resulted in lower self-presence and interpersonal attraction. On the other hand, in the Floating Hands condition, where dyads embodied an avatar that did not represent their whole bodies, it is possible that participants had a lower expectation of behavioral realism. Since participants in these dyads were able to control every part of the avatar that was rendered, the avatars may have met behavioral realism expectations and resulted in higher levels of interpersonal attraction and self-presence when compared to the Inferred Arms condition.
It is important to note that there were no differences in terms of interpersonal attraction between the Static Hands and both behavioral realism conditions. Contrary to our predictions, inhibiting a participant's ability to gesture, as was the case for the participants in the Static Hands condition, did not lead to significantly lower interpersonal outcomes when they interacted with an avatar that also inhibited their partner's ability to gesture. As predicted, higher levels of social presence led to more positive interpersonal outcomes in the form of higher interpersonal attraction scores regardless of condition. However, it is important to note this result's small effect size.
The average standard deviation of yaw, the rotating of the head around the Y-axis (e.g., shaking your head implying “no” or looking left to right around the room), was significantly and negatively correlated with interpersonal attraction. Participants within each dyad who repeatedly scanned the room from left to right during the social interaction felt significantly less interpersonal attraction toward their partner regardless of what condition they were randomly assigned to. The repeated head movement from left to right could indicate scanning around the room, the avoidance of eye-contact, or disagreement with their interaction partner. Past research has correlated eye contact with positive social interactions (Kleinke, 1986) and scanning behavior with anxiety (Won et al., 2016). However, given our limited measures, an exact explanation as to why changes in yaw resulted in lower interpersonal attraction scores regardless of condition would require future investigation.
Taken together these results showed there were no significant differences in interpersonal attraction, social presence, or self-presence scores when comparing avatars that exhibit no behavioral realism and avatars that exhibit behavioral realism. However, when avatars do exhibit some level of behavioral realism, higher behavioral realism (exhibited by your own avatar and the avatar you are interacting with) leads to significantly higher feelings of self-presence and interpersonal attraction when compared to lower behavioral realism. Additionally, higher changes in yaw can be interpreted as a physical manifestation of dislike in virtual dyadic interactions regardless of type of avatar representation.
5.1 Limitations and Future Directions
There are a number of limitations to these studies. First, our study design manipulated both of the participants' avatar representation within each dyad. This design allowed us to examine the effect of avatar representation and behavioral realism on interpersonal outcomes and nonverbal behaviors looking at CVEs where users share the same type of avatar representation and level of behavioral realism. However, the design of the current investigation does not allow us to discern whether the results obtained are a result of the level of behavioral realism that the avatar a participant embodied exhibits, the level of behavioral realism that the avatar of the participant's interaction partner exhibits, or a combination of both the type of avatar a participant embodied and interacted with. Future studies should examine the effects of embodying an avatar that exhibits a specific level of behavioral realism on interpersonal outcomes during an interaction with an avatar that exhibits a different level of behavioral realism in order to extrapolate whether or not the results obtained from this investigation are based on the self, the interaction with others, or both.
As mentioned above, the movement measures used were limited. Future studies should test if the type of controller or tracking system has an effect on nonverbal behavior and make sure that there is a pure measure of hand movement, not just a measure of controller rotation. Another limitation is that the length of the dyadic interaction, which included both the 20 Questions Game and the 5-minute “getting to know you” task, was not standardized. On average, the interaction lasted about 13 minutes. However, some interactions lasted between 10 and 18 minutes. Moreover, participants were able to hear each other's voices during the experience. It is possible that the participant's voice may have influenced how participants felt about each other and had an effect on interpersonal attraction scores. It is also important to note that except for the hand rotation results, effect sizes were relatively small. Therefore, future studies should try to replicate this design with a larger sample size.
Additionally, there is a potential confound between behavioral realism and avatar representation. The Floating Hands condition and the Inferred Arms condition were equally limited in their ability to control their avatar—only the head and hands were mapped in real time. The main difference between these conditions was that the avatar body parts that the participants were not able to control with their physical movements were invisible in the Floating Hands condition and visible in the Inferred Arms condition. In order to tease apart these two constructs, future studies should try to replicate our results by comparing a full-bodied avatar that has mapped hands and feet against a condition in which participants embody a full-bodied avatar that only has mapped hands in order to assess the sole role of behavioral realism on interpersonal outcomes and nonverbal behavior.
Another limitation of the present study is that we did not assess whether or not participants actually believed that they were interacting with a real person. Researchers told participants that this was the case; however, participants were not explicitly asked about their beliefs. Future studies should assess whether or not participants' beliefs about the nature of the virtual human they are interacting with influences the effect of avatar representation or behavioral realism on interpersonal outcomes. Additionally, the current investigation did not explicitly measure the extent to which participants expected their avatar or their partner's avatar to exhibit behavioral realism. Future studies should include a measure of participant's expectations of behavioral realism for different levels of photorealistic avatars to further understand the relationship between behavioral and photographic realism.
In this study, we used High Fidelity as the social VR platform where participants interacted with each other. Although the platform delivered on its ability to seamlessly network participants from remote locations inside a CVE, the platform is still being developed and the software had to be routinely updated in order to be able to run the study. Some updates were not relevant to our experiment, while others changed the virtual environment slightly (e.g., sometimes lighting inside the virtual room and the environment outside of the apartment would change). Although the differences were small, they still led to a lack of full experimental control over the room.
Our results suggest that there is no substantive advantage of having physical control over your avatar's hands when it comes to social presence, self-presence, and interpersonal attraction. However, if hands are to be tracked and mapped onto an avatar, having a full-body representation that is not fully controlled by the user may hinder interpersonal outcomes. Future studies should try to replicate our findings using a different social context (e.g., a negotiation task, a collaborative task, an interview setting, or a more structured “getting to know you task”) to fully understand the effect of behavioral realism on interpersonal attraction. Future directions should also include the addition of feet tracking to the full-bodied avatar (therefore increasing behavioral realism) to examine whether or not there is a difference in social presence, self-presence, and interpersonal attraction scores between this new representation and an avatar that does not fully represent the user's body.
6 Conclusion
Overall, the results of this study showed that the level of behavioral realism afforded to users affects nonverbal behavior. More specifically, participants who saw that their physical movements were not mapped onto their avatars and that their partners were also not able to control their avatar's hands, moved their real hands significantly less during the dyadic interaction than participants who embodied and interacted with avatars with mapped-hand movements. Additionally, being able to control every rendered aspect of an avatar leads to better interpersonal outcomes and higher levels of self-presence and interpersonal attraction even when the user's full body is not virtually represented. Finally, this study found that the standard deviation of yaw was negatively correlated with interpersonal attraction such that participants who constantly moved their head from left to right disliked their partners more than those who did not move their head as much.
Acknowledgments
The authors would like to thank Michael Arruza Cruz for his contribution to this work in terms of programming assistance.
References
Appendix Measures for Dependent Variables
Interpersonal Attraction
How strongly do you agree or disagree with the following statements about your partner?
I like my partner.
I would get along well with my partner.
I would enjoy a casual conversation with my partner.
My partner is the type of person I could become close friends with.
My partner is a good listener.
My partner is friendly.
Social Presence
How strongly do you agree or disagree with the following statements about your partner?
I felt like I was face-to-face with my partner.
I felt like I was in the same room as my partner.
I felt like my partner was watching me.
I felt like my partner was aware of my presence.
I felt like my partner was present.
Self-Presence
How strongly do you agree or disagree with the following statements about your avatar?
I felt like my avatar's body was my own body.
I felt like I was my avatar's body.
I felt like my avatar was an extension of me.
I felt like my avatar was me.
When something happened to my avatar, I felt like it was happening to me.