When observing social interactions, humans rapidly and spontaneously encode events in terms of agents, patients and causal relations. This propensity can be made visible empirically with the switch cost paradigm, a reaction time experiment and well-established tool of cognitive psychology. We adapted the paradigm for non-human primates to test whether non-linguistic animals encoded event roles in the same way. Both human and non-human participants were requested to attend to different social interactions between two artificially coloured (blue or green) actors and to target the actor masked by a specified colour (e.g., blue), regardless of her role. We found that when we switched the targeted colour mask from agents to patients (or vice versa) the processing time significantly increased in both hominid species (i.e., human and chimpanzee), suggesting that event roles were spontaneously encoded and subsequently interfered with our simplistic colour search task. We concluded that the propensity to encode social events in terms of agents and patients was a common feature of hominid cognition, as demonstrated in several human and one chimpanzee participant, pointing towards an evolutionarily old and phylogenetically shared cognitive mechanism central to language processing.

Understanding social interactions is fundamental for survival and a well-documented predictor of fitness (e.g., Alberts, 2019; Cheney et al., 2016; Silk, 2007), although relatively little is known about the underlying representational mechanisms. Extensive research with humans has consistently revealed a predisposition to attending and processing social interactions in the visual domain (McMahon & Isik, 2023). This may be grounded in a hardwired neurobiological predisposition to represent facing dyads, as a fundamental feature of social perception (Papeo, 2020). Indeed, the visual system prioritises socially-interacting over non-interacting pairs (Papeo et al., 2019; Skripkauskaite et al., 2023; Su et al., 2016; Vestner et al., 2019, 2020). This ability emerges early since infants as young as 6 months already demonstrate heightened efficiency in detecting facing compared to non-facing dyads (Goupil et al., 2022; Papeo & Abassi, 2019; Papeo et al., 2017). Privileged perception of social interactions appears to be a phylogenetically old predisposition (Deen et al., 2023), with evidence that even newly hatched domestic chicks (Gallus gallus) exhibit a preference for (seemingly socially) interacting pairs of lights over non-interacting ones (Zanon et al., 2024).

The perceptually grounded ability to process social interactions is intertwined with a propensity to perceive the world as causally structured. A classic demonstration is Michotte's (1963) launching effect, by which an object A collides with a resting object B, thereby giving the impression of causality, i.e., that AcausedB's motion. This ‘illusion of causality’ (Michotte, 1963; Scholl & Tremoulet, 2000) emerges early in life (Galazka & Nyström, 2016; Leslie, 1982; Leslie & Keeble, 1987) and has been replicated with various action types (Gao et al., 2009; White & Milne, 1997) highlighting a core feature of human cognition independent of age and culture (Mascalzoni et al., 2013; Morris & Peng, 1994; Rimé et al., 1985). Here again, the propensity to perceive causal interactions as special appears to be older than humans, with various animal studies having reported a sensitivity to causal relations in the mechanical domain (Cacchione & Krist, 2004; Hanus, 2009; Mascalzoni et al., 2010; O’Connell & Dunbar, 2005).

In the social domain, causal events can generally be decomposed in terms of event roles as agent or patient, each occupying a specific function in relation to the action. The dog chasing the postman and the postman chasing the dog involve the same entities and action but differ in terms of roles, or ‘who does what to whom'. It has been argued that event roles are abstract categories that can be universally discriminated and generalised to any entity, as long as it can adopt agent (doer) or patient (receiver) characteristics (Dowty, 1991; Rissman & Majid, 2019), an ability that emerges in early infancy (Hamlin et al., 2007; Robertson & Suci, 1980). Some animals also keep track of event roles, both in the visual (Krupenye & Hare, 2018) and auditory domain (Bergman et al., 2003; Clay et al., 2016; Slocombe et al., 2010; Slocombe & Zuberbühler, 2005; see Wilson et al., 2022 for a review). A recent study further demonstrated that great apes, when tasked with actively selecting between agents or patients of previously watched events, consistently made decisions influenced by the nature of the protagonists (i.e., animate or inanimate), as well as the nature of the interaction (Brocard et al., 2024). Overall, there is very little doubt that animals discriminate agents from patients, further suggesting evolutionary roots that predate the emergence of the human lineage, a finding with particular relevance for evolutionary theories of language (Wilson et al., 2022).

A peculiar feature of human event cognition is the speed and spontaneity by which event roles are identified (Dobel et al., 2007; Vettori et al., 2025). Humans can extract different types of social interactions as rapidly as the blink of an eye (Hafri et al., 2013; Isasi-Isasmendi et al., 2023). In one study, Hafri et al. (2018) asked human participants to react as rapidly as possible to an arbitrary visual feature in a depiction of people interacting with each other (e.g., always select the person wearing a blue shirt). Remarkably, if the person with the blue shirt suddenly switched from being the agent to being the patient (or the reverse), response latencies increased significantly, suggesting that the event roles were spontaneously encoded ‘along the way’ and interfered with the task at hand, the so-called ‘switch cost effect’ (Jerslid, 1927). Switch cost effects have been used in a range of cognitive tasks as they are excellent tools to bring to light unconscious or conflicting mental processes (Hafri et al., 2018; Jerslid, 1927; Oosterwijk et al., 2012; Pecher et al., 2003; Rogers & Monsell, 1995; Spence et al., 2001; Stoet & Snyder, 2007; Vettori et al., 2025; Wiseheart et al., 2016).

A few studies with non-human animals have also used switch cost paradigms, but mainly to study low-level visual processing (pigeons: Castro & Wasserman, 2016; Meier et al., 2013, 2016; E. M. O’Donoghue et al., 2020; E. O’Donoghue & Wasserman, 2021; macaques: Avdagic et al., 2014; Caselli & Chelazzi, 2011; Smith & Beran, 2018; Stoet & Snyder, 2007). Here, we were interested in higher-level cognition, that is, whether great apes, if asked to attend to arbitrary low-level visual features, would also encode event roles, even if not specifically asked to do so, as had been demonstrated in humans (Hafri et al., 2018). Demonstrating that non-human primates, similar to humans, can rapidly and automatically extract the structure of social events through visual processing of the world would suggest that event roles are likely to be routinely and readily available to primate observers. This availability could serve as common ground, forming the foundation for encoding events in terms of roles within language. By contrast, if structuring events in terms of agent and patient roles required effortful deliberation, it would be less likely for multiple individuals to generate them simultaneously, potentially hindering the emergence of a shared external language.

Experimental Design

Humans and many animal species can recognise and correctly interpret still pictures, even without previous experience (see Bovet & Vauclair, 2000 for a review; hamadryas baboons: Kyes & Candland, 1987; long-tailed macaques: Kyes et al., 1992; rhesus monkey: Rosenfeld & Van Hoesen, 1979; chimpanzees: Gardner & Gardner, 1984; Hayes & Hayes, 1953; Savage-Rumbaugh et al., 1980) and if they convey motion information this can even create a sense of causality (adult humans: Freyd, 1983; Guterstam & Graziano, 2020; Kourtzi & Kanwisher, 2000; human infants: Shirai & Imura, 2016; rhesus macaques: McFarland et al., 2013). This literature encouraged us to adapt the original switch-cost protocol by Hafri et al. (2018) for great apes. To this end, we also capitalized on recent findings that hominids view and interpret events in terms of agent and patient roles (Brocard et al., 2024; Wilson et al., 2024), and presented pictures of pairs of chimpanzees, gorillas, humans or orangutans interacting in an agent-patient relationship (e.g., grooming, playing, touching). We then arbitrarily masked the agent or the patient with blue or green shading colours that great apes like and discriminate easily (Pene et al., 2020; Wells et al., 2008). In contrast to Hafri et al. (2018), the stimuli were presented on a touchscreen allowing direct selection of a target, rather than pressing a button.

We first conducted a pilot study (see Supplementary Text S1) with two orangutans, which allowed us to fine-tune the final training and test protocols. Problems recognised during the pilot study were the large number of trials required to obtain a reward as well as the strong effect of the repetition of the target side. We thus decided to present the stimuli in pairs (instead of sequences of several stimuli) to enhance the likelihood of selecting the correct assigned target and to limit the effect of the repetition of the target’s side.

Subjects

Humans.

Ten undergraduate students (N = 7 females; mean 21.0 ± 1.3 years old; range [19; 24]) from the University of Neuchâtel (Switzerland) were recruited via e-mail. All participants had normal or corrected-to-normal vision and none of them were colour-blind. Before the start of the experiment, participants received basic information about the procedure and received a detailed information sheet. They were instructed to find a (randomly assigned) coloured target on images depicting two interacting individuals in order to touch it as rapidly and accurately as possible. All participants were naive to the purpose of the study, signed an informed consent form, completed a short questionnaire and attended a debriefing and question and answer session after the experiment. The entire experiment took approximately 2 hours to complete, with short breaks between blocks. Participants were offered CHF 40 for their participation.

Great Apes.

Subjects were recruited from a group of N = 15 chimpanzees and N = 7 gorillas housed at Basel Zoo, Switzerland. All individuals had access to indoor and outdoor enclosures (chimpanzees: 767 m2; gorillas: 753 m2) that contained ropes, hammocks, climbing structures and freshly provided material to build nests. Individuals were fed a mix of fruit and vegetables supplemented with small amounts of proteins, with several feeds distributed throughout the day and free access to water.

A total of N = 6 chimpanzees (N = 4 females; mean 13.3 ± 15.1 years old; range [4; 42]) and N = 4 gorillas (N = 3 females; mean 15.5 ± 12.3 years old; range [7; 33]) participated in the study. However, the training procedure turned out to be extremely arduous and discouraging for great apes (as it was for humans), to the effect that only one chimpanzee, Kume, the 19-years old alpha male, completed the entire study. While this is an obvious shortcoming, Kume’s stamina and eventual successful completion is informative of the cognitive capacities of chimpanzees as a species.

Ethical Statement.

This study was non-invasive, risk-free and participation was on a voluntary basis for both species. The study was approved by the Ethics Committee of the University of Neuchâtel (project 101/2022), the Cantonal Veterinary Office of Basel Stadt (permit cantonal number 3077) and the Animal Welfare Officer at Basel Zoo.

Stimuli

Stimuli were coloured photographs of two adult individuals from the same species (i.e., chimpanzees, gorillas, humans or orangutans) interacting in eight different ways (test trials; for details of event categories see Table S1) or just being next to each other (training trials). The identities of the actors varied across nearly all stimuli, with limited repetition and in human stimuli only. All participants were exposed to all stimuli, i.e., actors of their own and the three other species. N = 40 images were used for the test (10 images per actor species) and N = 60 images for the training (15 images per species). Semi-transparent coloured masks were superimposed to both actors, either blue (RGB [0, 0, 255]) or green (RGB [0, 255, 0]), with transparency set to 85% maximum (Figure 1D–E), which resulted in four combinations, i.e., agent left or right and in blue or green.

Figure 1.

Example of stimuli (A–D) for the training phase 1 to 4 and (E) for the test. (A) Training phase 1: only targeted colour with a 50% transparency mask, as many touch as needed to reach the target, feedback in every trial, session of 20 trials. (B) Training phase 2: two actors coloured with 50% transparency mask, only one touch allowed to correctly press the target, feedback on all trials, 30 trials per daily session. (C) Training phase 3: same as phase 2 but transparency of the mask set at 70%. (D) Training phase 4: transparency of the masks set at 85%, daily session of 40 trials and sound reward at every trial but food only delivered every two trials with a new sound. (A-D) A single image is presented in its four possible configurations across the four phases. (E) Test: transparency of the masks set at 85%, and sound reward at every trial but food only delivered every two trials with a new sound. Photo credit: Orangutan Jungle School, Season #1, NHNZ Worldwide.

Figure 1.

Example of stimuli (A–D) for the training phase 1 to 4 and (E) for the test. (A) Training phase 1: only targeted colour with a 50% transparency mask, as many touch as needed to reach the target, feedback in every trial, session of 20 trials. (B) Training phase 2: two actors coloured with 50% transparency mask, only one touch allowed to correctly press the target, feedback on all trials, 30 trials per daily session. (C) Training phase 3: same as phase 2 but transparency of the mask set at 70%. (D) Training phase 4: transparency of the masks set at 85%, daily session of 40 trials and sound reward at every trial but food only delivered every two trials with a new sound. (A-D) A single image is presented in its four possible configurations across the four phases. (E) Test: transparency of the masks set at 85%, and sound reward at every trial but food only delivered every two trials with a new sound. Photo credit: Orangutan Jungle School, Season #1, NHNZ Worldwide.

Close modal

Touchscreen Equipment

Data were collected weekdays from 8:00 to 18:00 local time, from September 2022 to February 2024. The ape setup consisted of a touchscreen (Iiyama ProLite T1931SR, 19”, 1280 × 1024 resolution, 5ms response time, resistive technology) connected to a laptop computer (Dell Latitude 7420) allowing to display the stimuli with Matlab’s Psychophysics Toolbox Version 3 (Brainard, 1997; Pelli, 1997). For the chimpanzees, the setup was fixed in one enclosure and protected by a customised Plexiglas box (73 × 55 × 35 cm) open on its bottom allowing the apes to access the screen and retrieve the rewards. For the gorillas, the screen was attached to the mesh of the enclosure at the start of each session and removed at the end (Figure S1). The touchscreen was calibrated every day before the start of the session with eGalaxTouch software (v.5.14.0.19810). Human data were also collected via a laptop (Lenovo Thinkpad T15) connected to a touchscreen (Iiyama ProLite T2553Mis-B1, 24”, 1920x1080 resolution, 4 ms response time, infrared technology).

Training Phase

A target colour (blue or green) was randomly assigned to each participant at the beginning of the study. Humans were verbally instructed to select the targeted colour, whereas great apes underwent training. This consisted of rewarding the subjects to consistently select targets of any actor species (all stimuli were presented in randomised order) masked by the assigned colour. Training consisted of four phases. In phase 1, only one actor was applied a mask (transparency 50%; Figure 1A) from the targeted colour and participants were rewarded at every correct pressing. To pass this phase 1, participants needed an accuracy of at least 80% in one 20 trials session. In phase 2, the second actor was also coloured (Figure 1B) and participants only had one chance to correctly select the right target, with an 80% accuracy on a 30 trials session required to pass this phase. The difficulty of the remaining two phases increased progressively as the transparency of the mask increased, making it more difficult to detect the target. In phase 3, the transparency increased to 70% (Figure 1C), and by the final phase 4, it was set to 85% (Figure 1D). In addition, in phase 4, food rewards were no longer delivered after each correct trial, but only after two correct trials, which was indicated by a new sound. Training was considered completed when participants obtained an accuracy of 80% in a session of 40 trials in phase 4. The number of trials per session was gradually increased because the great ape subjects were not used to completing so many trials in a single day.

N = 6 chimpanzees and N = 4 gorillas participated in the training, but as mentioned, only one chimpanzee successfully completed all four training phases (dropouts, phase 1: N = 3; phase 2: N = 1; phase 3: N = 2; phase 4: N = 3), mostly due to lack of concentration and motivation. We opted against further prolonging the training, recognising that the crucial aspect for detecting a switch cost lies in attentiveness to the stimuli.

The N = 10 human participants were verbally instructed to pay attention to the actor masked by the assigned colour. Human participants could therefore skip the first three phases of the training and only completed N = 10 trials of phase 4 (Figure 1D).

Testing Phase

Chimpanzee.

For the chimpanzee we generated a list of N = 3,200 trials, sorted into four randomised blocks (based on the actor’s species depicted on the stimuli) of 800 trials each. Importantly, we ensured that the participant saw as many pairs of stimuli with (i) the target on the same side with the same role, (ii) the target on the same side with a switched role, (iii) the target on the switched side with the same role, or (iv) the target with switched side and role. We also ensured that no more than two pairs of the same configuration followed one another. Each image was presented 80 times (20 presentations of the same configuration of the stimuli, a configuration being for example the agent in green on the left-hand side and the patient in blue on the other side, Figure 1E). As the identities of the actors varied in nearly all stimuli, no role-switching within event categories were implemented. Based on the results of the pilot study (see Supplementary Text S1), stimuli were presented as consecutive pairs of trials with a food reward after each pair, instead as sequences of multiple trials.

A test session started with the chimpanzee pressing the uniformly-lit, green touchscreen, which triggered a positive sound and food delivery. The participant then completed three warm-up trials (i.e., images randomly positioned on the screen) and three training trials from phase 4 (Figure 1D). The participant then saw a circle of his targeted colour that he had to press to get access to the test session of N = 60 trials. Each trial consisted of an image depicting an agent acting on a patient and the participant was requested to select the actor masked by his targeted colour, which triggered sound feedback (i.e., correct or incorrect) and led to the next trial independently of the success. Once the participant made its second choice, again a sound feedback was produced, followed by a “food sound” accompanied with the delivery of a piece of food. Following this, the circle of the targeted colour reappeared, which the participant needed to press to get access to the next two trials. After completion of the N = 60 trials, the screen turned red, and the participant received three pieces of food to indicate the end of the session (Figure 2).

Figure 2.

Schematic illustration of the test session structure. Before starting the test, the chimpanzee completed three training trials while humans completed 10 training trials, all of phase 4 (Figure 1D). It was followed by 60 test trials for the chimpanzee and an entire block (N = 800 trials) for the humans. Images were presented in pairs starting with a circle of the targeted colour. At the end of a pair, a “food sound” was emitted and chimpanzees received a food reward. The images were displayed full screen and the circle at 600 × 600 pixels. Photo credit: (training) Orangutan Jungle School, Season #1, NHNZ Worldwide; (tests) A. Isasi-Isasmendi, S. Sauppe, and C. Andrews.

Figure 2.

Schematic illustration of the test session structure. Before starting the test, the chimpanzee completed three training trials while humans completed 10 training trials, all of phase 4 (Figure 1D). It was followed by 60 test trials for the chimpanzee and an entire block (N = 800 trials) for the humans. Images were presented in pairs starting with a circle of the targeted colour. At the end of a pair, a “food sound” was emitted and chimpanzees received a food reward. The images were displayed full screen and the circle at 600 × 600 pixels. Photo credit: (training) Orangutan Jungle School, Season #1, NHNZ Worldwide; (tests) A. Isasi-Isasmendi, S. Sauppe, and C. Andrews.

Close modal

Humans.

For human participants, we generated unique lists of N = 3,200 trials each, following the exact same procedure as for the chimpanzee. Human participants were tested in the same way as the chimpanzee, although they did not get warm-up trials but transitioned directly from the training phase 4 to the test trials, and they did not receive food reward but money compensation at the end of the experience. In another difference, human participants were not tested in sessions, rather they completed each block consisting of N = 800 trials (Figure 2), after which the participant was allowed to have a break, before the next block of N = 800 trials and so forth.

Reaction times (RTs) were determined as the time intervals from the stimulus display until the screen was touched and released. Accuracy was defined as the success in selecting the correct assigned colour target.

Data Processing

After removal of the first trials of a pair (as there was no switch cost possible), trial exclusion criteria were similar to those used by Hafri et al. (2018), namely: trials in which more than one press on the screen was necessary to reach a target (correct or incorrect), trials faster than 200 ms, trials with RTs slower or faster than 2.5 standard deviations of each subject’s mean RT and trials with incorrect answers (kept for the accuracy analyses). This resulted in 54.0% trials excluded for the chimpanzee and an average of 5.7% trials excluded for human participants.

A preliminary analysis showed that human participants found it easier to detect blue than green actors, regardless of their event role (see Supplementary Text S2). Actors coloured in green were generally harder to detect, possibly because the natural surroundings in the photos were often of a greenish colour. We therefore split the dataset in two, and only report the performance of participants assigned the blue targets in the main results, which was consistent with the assigned colour of the single chimpanzee that successfully completed the training.

Data Analysis

Statistical analyses were conducted in R (R Core Team, 2021, v.4.0.5) using Bayesian generalised linear models with the brms (Bürkner, 2017, 2018, 2021) to Stan (Carpenter et al., 2017). This approach was chosen because we aimed to quantify evidence for and against differences between species, whereas frequentist methods only allow for the rejection of a null hypothesis.

Bayesian Bernoulli models with logit links were fitted to model (i) accuracy and (ii) response side. For the intercepts and all population-level predictors, normally distributed priors were assigned (μ = 0, σ = 1.5), resulting in relatively flat distributions on the logit scale. Exponentially distributed priors (λ = 1) were used for the standard deviation of group-level effects. When modelling the accuracy, we included as predictors of interest the species of the participants, the repetition of the target’s side and role, along with their interaction, the event role of the target and the side of the agent. To control for potential confounding factors, features of the stimuli (i.e., the colour of the agent, species of the actors, and event category; see Table S2 for formal definitions of all variables) were included as fixed effects, as well as random slopes by stimuli, and block. When not modelled as predictor of interest the side of the agent and the role of the target were also added as random slopes. Then, in modelling the response side, our primary focus was on identifying trial features that could influence choices. Therefore, we included the species of the participant and its interaction with block, side of the agent, role of the target, and event category as fixed effects. Additionally, participants and stimuli were treated as random intercepts, with side and colour of the agent as random slopes for the stimuli.

Reaction times were modelled by fitting Bayesian regressions, using exponentially modified Gaussian distributions (Baayen & Milin, 2010), with logarithmic links functions for both sigma and beta. Normally distributed priors (μ = 0, σ = 2.5) were used for the intercept, all population-level predictors and the residual standard deviation sigma, exponentially distributed priors (λ = 1) were used for the standard deviation of group-level effects, and finally gamma distributed priors (k = 1, θ = 0.2) were used for the regression coefficient beta. The predictors of interest were the species of the participants, repetition of the target’s side and role, as well as their interactions. Additional population-level predictors were included to control for potential confounding effects (i.e., the side of the agent, the role of the target, the colour of the agent, species of the targets, and event category). Finally, these same effects were used as random slopes by stimuli, blocks and participants. Additional models were considered to assess the effect of blocks and for human participants only, to control for the effect of the target colour (see Supplementary Text S2).

For each outcome variable, several models were fitted with different combinations of fixed predictors. To determine the best fitting model for each outcome variable, these models were compared using their expected log pointwise predictive density (elpd) under leave-one-out cross validation, which was approximated by importance sampling from the posterior (Vehtari et al., 2017; Yao et al., 2018). A higher elpd indicates a higher out-of-sample predictive fit, thus a better model. The convergence of all models was ensured through trace plots, R-hat values, and effective sample size (ESS) diagnostics. Posterior predictive checks were additionally used to visually assess the fit between observed data and data simulated by the model.

For all predictors of interest, i.e., the population-level (fixed) effects (β), we report their medians and 90% credible intervals. Additionally, we report the posterior probability of the hypothesis that an estimate is either smaller or larger than 0 (or 0.5 depending on the specific hypothesis), denoted as P(β < 0) or similar, to quantify the directional evidence for the hypothesis. To illustrate the differences between parameters levels while controlling for all other variables, marginal effects were computed using the emmeans package (Lenth et al., 2018). These marginal effects were expressed either as contrasts on the response scale—indicating the direct difference in predicted outcomes between parameter levels—or as odds ratios, which describe the relative odds of an event occurring under different parameter conditions. Throughout the analyses, an effect was considered robust if at least 90% of posterior estimates were either above chance (0.5) or different from 0, i.e., P(P > 0.5) ≥ 0.90 or P(β > 0) ≥ 0.90.

Accuracy

In the test trials, participants saw photographs with two hominids (chimpanzees, gorillas, humans or orangutan) engaged in a social interaction with one actor coloured in blue and the other in green (Figure 1E; N = 40 photographs). By means of vocal instruction or training, participants were instructed to always select the actor tints with their assigned colour, regardless of its event role. As explained above, we only analysed participant responses which were assigned blue targets.

After data processing, we found a main difference in the accuracy between humans and the chimpanzee. While humans were highly accurate in selecting the blue target (mean accuracyHumans = 99.5% ± 7.1%), the chimpanzee struggled much more with this task (mean accuracyChimpanzee = 66.3% ± 47.3%).

Since humans were so highly accurate, we proceeded with analysing the chimpanzee data only and found that the best fitting model (see Table S3 for all comparisons) only included the event role of the target as predictor, and accuracy was better when the target was the patient (marginal effect: median(Role Target = Patient) – (Role Target = Agent) = 2.206 odds ratio, 90% CI = [1.119; 4.437], meaning that the chimpanzee’s odds of correctly choosing the target when it was a patient were approximately double the odds as when it was an agent; posterior median PRole Target = Agent = 0.312, 90% CI = [0.188; 0.480], P(PRole Target = Agent < 0.5) = 0.97; posterior median PRole Target = Patient = 0.689, 90% CI = [0.523; 0.813], P(PRole Target = Patient > 0.5) = 0.97).

Switch Cost Effects - Spatial Continuity and Event Roles

If participants were capable of keeping track of targets from one trial to the next, we predicted an increase in reaction time in responding to the blue target, if it unexpectedly switched sides, in breach of a basic feature of reality: the spatial continuity of bodies. Second, if participants also kept track of the event roles assigned to them by the social interaction, we predicted an increase in reaction time in responding to the blue target, if its event role unexpectedly switched from agent to patient or vice versa.

Humans and the chimpanzee had similar reaction times in selecting their blue targets (median RTHumans = 747.2 ms; median RTChimpanzee = 764.3 ms).

Including repetitions of the target’s side and the target’s role as additive effects, without the species of the participants, slightly improved the fit of the model to the data, as shown by model comparisons (ΔelpdBest model vs. Model with the Participant’s Species = −0.5, SEΔelpd = 0.7; see Table S4). This best fitting model revealed an effect of the repetition of the target’s event role (βRepetition Target Role: median = −4.99, 90% CI = [−8.30; −1.73], P(β < 0) = 1) and an effect of the repetition of the target’s side (βRepetition Target Side: median = −13.44, 90% CI = [−16.76; −10.16], P(β < 0) = 1; Figure 3).

Figure 3.

Posterior probabilities of reaction time differences arising from the target switching (A) the side and (B) the event role. Plotted data are posterior probabilities from the best fitting model of the RTs differences of the target’s event role and side in the photographs. The dashed line corresponds to no difference (0). The points represent the median of the estimates (values specified above) and the three levels of thickness of the point intervals represent the 30, 60 and 90% credible intervals.

Figure 3.

Posterior probabilities of reaction time differences arising from the target switching (A) the side and (B) the event role. Plotted data are posterior probabilities from the best fitting model of the RTs differences of the target’s event role and side in the photographs. The dashed line corresponds to no difference (0). The points represent the median of the estimates (values specified above) and the three levels of thickness of the point intervals represent the 30, 60 and 90% credible intervals.

Close modal

Interfering with the spatial arrangement had a considerably stronger impact on processing time than interfering with the event role of a social interaction (Table 1). In particular, if the target suddenly switched sides from left to right (or vice versa) from one trial to the next, then reaction times of all participants were impacted (medians: ΔHumans = 54 ms slower; ΔChimpanzee = 48 ms faster), with no robust statistical difference between species (marginal effect when the side was repeated: medianHumans – Chimpanzee = −17.9, 90% CI = [−323; 308]; marginal effect when the side was switched: medianHumans – Chimpanzee = 4.01, 90% CI = [−310; 324]). This demonstrates that participants identified targets individually and tracked their spatial position within the social interaction.

Table 1.

Participants’ median reaction time changes if spatial position or event roles were kept identical or switched from one trial to the next, including switch costs and median marginal effects from the best fitting model

ConditionParticipant's speciesReaction times (ms ± SD)Switch cost (ms)Median marginal effect (90% CI)
RepeatedSwitched
Side Human 723.3 (342) 777.4 (352) 54.1 26.9 [21.3; 32.4] 
Chimpanzee 791.3 (939) 743.4 (846) −47.9 
Role Human 744.3 (349) 750.6 (347) 6.32 9.98 [4.68; 15.5] 
Chimpanzee 757.4 (863) 773.9 (934) 16.5 
ConditionParticipant's speciesReaction times (ms ± SD)Switch cost (ms)Median marginal effect (90% CI)
RepeatedSwitched
Side Human 723.3 (342) 777.4 (352) 54.1 26.9 [21.3; 32.4] 
Chimpanzee 791.3 (939) 743.4 (846) −47.9 
Role Human 744.3 (349) 750.6 (347) 6.32 9.98 [4.68; 15.5] 
Chimpanzee 757.4 (863) 773.9 (934) 16.5 

Note. Reaction times were extracted from the raw data, while median marginal effects were extracted from the best model and are the difference between Switched and Repeated. The switch cost was defined as the difference between reaction times to switched and repeated spatial location or event role, respectively.

Importantly, if the event role of the target suddenly switched from one trial to the next, then reaction times increased for all participants, but again with no robust statistical difference between the two species (medians: ΔHuman = 6 ms; ΔChimpanzee = 16 ms; marginal effect when the role was repeated: medianHumans – Chimpanzee = 4.05, 90% CI = [−312; 330]; marginal effect when the role was switched: medianHumans – Chimpanzee = −18.0, 90% CI = [−337; 305]), demonstrating that all participants processed the event roles of the targets in the depicted social interaction (Table 1).

Both species were faster if the target was an agent than a patient (medians: ΔHuman = 17 ms; ΔChimpanzee = 49 ms), although this effect was not statistically robust (marginal effect: medianAgent – Patient = −10.5, 90% CI = [−58.4; 37.9]). Finally, the chimpanzee was faster if the target changed from agent to patient relative to a change from patient to agent (median Δ = 21 ms), while humans were slower (median Δ = 10 ms), but not robustly so (marginal effect: median(Agent to Patient) – (Patient to Agent) = −10.4, 90% CI = [−57.7; 36.0]).

Switch Cost Effects - Conspecific Actors Only

When considering the subset of trials with social interactions performed by conspecifics only (humans responding to human interactions, the chimpanzee responding to chimpanzee interactions), model comparison (Table S5) showed that including the participant's species, the repetition of the target's event role, the repetition of the target's side, as well as their interactions, improved the fit of the model to the data (ΔelpdBest fitting model vs. Null model = −10.9, SEΔelpd = 6.8). This best fitting model revealed an interaction between the target’s event role and side (βRepetition Target Role × Repetition Target Side: median = 53.4, 90% CI = [31.1; 75.5], P(β > 0) = 1), an interaction between the species of the participant and the repetition of the target role (βHuman × Repetition Target Role: median = −24.0, 90% CI = [−47.0; −1.4], P(β < 0) = 0.96), as well as a triple interaction between the species of the participant, the repetition of the target role and side (βHuman × Repetition Target Side × Repetition Target Role: median = −55.8, 90% CI = [−79.0; −32.7], P(β < 0) = 1).

Humans paid a switch cost if both spatial continuity and event role were disrupted between two consecutive trials (marginal effect: median(Role switched × Side switched) – (Role repeated × Side repeated) = 39.8, 90% CI = [23.3; 58.2]), while the chimpanzee did not (marginal effect: median(Role switched × Side switched) – (Role repeated × Side repeated) = −21.9, 90% CI = [−82.8; 34.9]). However, the chimpanzee experienced a positive effect (faster reaction times) when only the role changed and spatial continuity was maintained (marginal effect: median(Role switched × Side repeated) – (Role repeated × Side repeated) = −145.4, 90% CI = [−208.3; −81.8]), while humans experienced a negative effect (slower reaction times) when only the side changed and role of the target was maintained (marginal effect: median(Role repeated × Side switched) – (Role repeated × Side repeated) = 35.4, 90% CI = [18.5; 52.4]).

Other Effects

Similar to the two orangutans in the pilot study (see Supplementary Text S1), the side of the response (i.e., pressing on the left or right hand side) was better explained by the side of the agent than by the null model (ΔelpdBest fitting model vs. Null model = −8.2, SE Δelpd = 3.8), but robustly only for the chimpanzee (chimpanzee: marginal effect medianAgent Left – Agent Right = 1.515 odds ratio, 90% CI = [1.163; 1.926], meaning that the chimpanzee’s odds of selecting the target on the right side (the dependant variable) was approximately 1.5 times higher if the agent was on the left (opposite) side. In other words, the chimpanzee demonstrated a preference for selecting the patient over the agent; posterior median PChimpanzee select the actor on the right side × Side Agent Right = 0.385, 90% CI = [0.326; 0.447], P(PChimpanzee select the actor on the right side × Side Agent Right < 0.5) = 1; humans: marginal effect medianAgent Left – Agent Right = 0.954 odd ratio, 90% CI = [0.899; 1.016], posterior median PHuman select the actor on the right side × Side Agent Right = 0.614, 90% CI = [0.552; 0.673], P(PHuman select the actor on the right side × Side Agent Right > 0.5) = 1). This result for the chimpanzee was also accompanied by a side bias (ratio left/right = 1.62), which might explain the low accuracy rate and the high proportion of data removal.

The blocks also had an effect on the reaction times, with statistically robust interaction with the repetition or switch of the target’s role and side (ΔelpdModel with interactions vs. Model without interactions = −5.0, SEΔelpd = 4.7). Participants had statistically robust longer reaction times in the first block only (marginal effects: medianBlock1 − Block2 = 75.0, 90% CI = [14.6; 139.9]; medianBlock1 − Block3 = 112.0, 90% CI = [15.7; 203.3]; medianBlock1 − Block4 = 129.2, 90% CI = [67.5; 190.9]; medianBlock2 − Block3 = 39.8, 90% CI = [−45.3; 120.8]; medianBlock2 − Block4 = 57.0, 90% CI = [−16.7; 125.1]; medianBlock3 − Block4 = 16.1, 90% CI = [−66.1 ; 98.9]; see Table S6 for all contrasts with the interaction).

The present study aimed to investigate whether great apes, similar to humans, rapidly and spontaneously encoded event roles when required to find a coloured target in images depicting natural social interactions of an agent acting on a patient. Even though only one non-human primate, a chimpanzee, completed the full study, including training and tests, our study generated very similar results for the chimpanzee and the humans. Both species were slower to select targets when their roles changed, relative to when their roles remained across consecutive stimuli, suggesting that all participants spontaneously encoded the event structure by processing the event roles (i.e., agent, patient) played by the actors, aligning with findings from human psychology (Hafri et al., 2018).

This switch cost effect appeared to be stronger in non-human primates, despite no robust statistical differences, with the chimpanzee taking approximately 4 ms (median marginal effect) longer than humans to respond when the role of the target switched. The human reaction times were however consistent with previously reported results (Hafri et al., 2018; Vettori et al., 2025). Regardless of the magnitude of the cost, the data suggest that both species processed event roles rapidly and spontaneously and that such processing induced a cognitive cost when the role of the target switched. More generally, the finding supports the hypothesis that the ability to decompose events in terms of event roles might not be uniquely human but shared with other primates, indicating deep evolutionary roots of event cognition (Wilson et al., 2022). Our study adds to this theory, suggesting that the ability to automatically assign event roles might have emerged as a spontaneous cognitive routine in early hominids, and possibly also other groups of animals. As such, it may have played a key role as a blueprint for the rapid event encoding mechanism that underlies grammatically structured language.

In both species participants also experienced a switch cost when the spatial continuity of actors was disrupted between consecutive trials (i.e., if A was to the left of B and then to the right of B on the next trial). Although interfering with the spatial arrangement appeared to incur a substantial switch cost, even more so for humans if the conspecifics’ roles also changed, requiring participants to recode information on both dimensions. Social interactions are not just facing dyads, but also grounded in a place (Papeo, 2020; Papeo et al., 2019; Skripkauskaite et al., 2023; Su et al., 2016; Vestner et al., 2019, 2020), suggesting that detection and understanding of social events also require locational information. It is reasonable to assume that before processing event roles, the brain first encodes the spatial disposition, which might explain why both species in the current study demonstrated a larger switch cost for spatial re-arrangement than event roles disruption. A recent study further demonstrated that altering visual properties (i.e., body postures) from one stimulus to the next induced a switch cost effect in humans (Vettori et al., 2025).

Finally, some human participants may have had difficulties discriminating between agents and patients in the ape interactions, which might have resulted in some switch cost differences between the chimpanzee and the humans. Such difficulties in agent/patient discrimination might have prevented the emergence of a switch cost, resulting in an overall lower switch cost. This might also explain why human participants were slower to answer when the side of the target switched from one stimulus to the next, while the chimpanzee and human participants in similar studies were faster (Hafri et al., 2018; Vettori et al., 2025).

Despite similarities in reaction times, the accuracy data revealed notable differences between the humans and the chimpanzee. Humans demonstrated near-perfect accuracy, whereas the chimpanzee showed significantly lower accuracy. This difference may be attributed to several factors, including differences in training between species, task familiarity, attentional capacities, interest in the task, and lack of understanding of the task. Despite these concerns, the chimpanzee participant passed all training thresholds, went through warm-up training trials prior to the tests, and received sound feedback throughout the sessions. A more plausible explanation of the difference might be related to the lack of attention or the rush to complete a sequence to receive the food reward. In addition, in a task switching paradigm, Stoet and Snyder (2007) stressed that human participants receiving training (ca. a third of that for monkeys in their study) had an increased switching cost effect for reaction times but a decreased switching cost effect for the accuracy. Thus, the large number of training trials (ca. 1000) underwent by the chimpanzee participant might have prevented the emergence of switch cost effect in accuracy and might explain the seemingly stronger – compared to human participants in this study – switch cost effect on reaction times. The chimpanzee's lower accuracy might also be influenced by his side bias, which could have impacted overall performance.

Interestingly, the chimpanzee was more accurate when the target had a patient role compared to an agent role, suggesting two non-exclusive possible interpretations. One possibility is that the chimpanzee wrongly learned to associate the target with the patient role rather than with the colour blue. However, this seems highly unlikely, as the training stimuli were specifically designed to exclude any agent-patient relationships, making it near impossible for the chimpanzee to have formed such an association. An alternative explanation is that patients might be inherently easier to process, reducing cognitive demands and minimizing interference with the target detection task. This interpretation could also explain why the chimpanzee was faster to respond when the target’s role switched from agent to patient but was slower when the target switched from patient to agent.

One limitation of this study, already pointed out, is that our results are based on a single chimpanzee. This limits the generalizability of the findings, especially when previous studies have stressed individual variations in sensitivity to switching cost effect (Caselli & Chelazzi, 2011; Stoet & Snyder, 2007). However, conducting such a protocol with non-human animals - voluntarily participating and under no water or calorie restrictions - remains a tedious task as participants need to remain focused and motivated for long periods of time. Such a difficulty appeared quite clearly with the lower accuracy rate of the chimpanzee compared to humans, and even more so with the number of participants that dropped the training. Recently, Papeo et al. (2024) used habituation and pupillometry methods to test 7-month-olds infants’ reaction to switching event roles. Such methods could prove highly advantageous for adapting similar protocols to non-human great apes, as they require minimal training and significantly fewer trials.

Additionally, due to the challenges encountered during data collection, this study lacks a control condition, such as a “back-to-back” scenario. Previous research (Hafri et al., 2018; Vettori et al., 2025) has suggested that changes in posture and orientation may account for at least some of the observed switch-cost effects, though not all. Hence, it would be valuable in future research to explore whether, like humans, non-human great apes exhibit similar sensitivities to body posture and orientation when processing and assigning event roles.

In conclusion, this study provides preliminary evidence that a non-human primate, like humans, rapidly and spontaneously encoded event roles from social interactions, as indicated by the role switch cost effect. This finding potentially suggests that the cognitive ability to automatically process agents and patients in social events is evolutionarily ancient and shared with our closest living relatives. The paradigm has demonstrated how hard it is to override or even impossible to ignore a deeply anchored propensity to process events in terms of agents causing changes in patients, in both human and a non-human hominid, providing valuable insights into the early evolution of the language faculty eventually capable of syntax (Zuberbühler, 2019, 2022; Zuberbühler & Bickel, 2022).

We thank Andrea Cruz Alegria for helping in data collection and Flurin Baer, Adrian Baumeyer, Markus Beutle, Rene Buob, Nicole Fischer, Roland Kleger, Stephan Lopez, Gaby Rindlisbacher, Jonas Schaub, Amanda Spillmann, Lukas Staenke, Patrick Wyser, Fabia Wyss, Dominic Wyss, Corinne Zollinger, Reto Lehmann & the technician team for support and assistance at Basel Zoo. We thank Caroline Andrews, Arrate Isasi-Isasmendi and Sebastian Sauppe for providing some of the stimuli. We are also grateful to TTF DataScience NCCR@LiRi for their advice.

This research was funded by the Swiss National Science Foundation (project grant numbers 310030_185324, KZ, and 100015_182845, BB) and the National Center for Competence in Research “Evolving Language” (SNSF agreement number 51NF40_180888, BB, KZ).

SB: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Resources; Visualization; Writing – original draft; Writing – review & editing. PVV: Conceptualization; Methodology; Resources; Writing – review & editing. BB: Conceptualization; Formal analysis; Funding acquisition; Methodology; Resources; Supervision; Visualization; Writing – review & editing. KZ: Conceptualization; Formal analysis; Funding acquisition; Methodology; Resources; Supervision; Visualization; Writing – review & editing.

Data and code are available at https://doi.org/10.60544/wttr-a816.

Alberts
,
S. C.
(
2019
).
Social influences on survival and reproduction: Insights from a long-term study of wild baboons
.
Journal of Animal Ecology
,
88
(),
47
66
. ,
[PubMed]
Avdagic
,
E.
,
Jensen
,
G.
,
Altschul
,
D.
, &
Terrace
,
H. S.
(
2014
).
Rapid cognitive flexibility of rhesus macaques performing psychophysical task-switching
.
Animal Cognition
,
17
(
3
),
619
631
. ,
[PubMed]
Baayen
,
R. H.
, &
Milin
,
P.
(
2010
).
Analyzing reaction times
.
International Journal of Psychological Research
,
3
(
2
),
12
28
.
Bergman
,
T. J.
,
Beehner
,
J. C.
,
Cheney
,
D. L.
, &
Seyfarth
,
R. M.
(
2003
).
Hierarchical classification by rank and kinship in baboons
.
Science
,
302
(
5648
),
1234
1236
. ,
[PubMed]
Bovet
,
D.
, &
Vauclair
,
J.
(
2000
).
Picture recognition in animals and humans
.
Behavioural Brain Research
,
109
(
2
),
143
165
. ,
[PubMed]
Brainard
,
D. H.
(
1997
).
The psychophysics toolbox
.
Spatial Vision
,
10
(
4
),
433
436
.
[PubMed]
Brocard
,
S.
,
Wilson
,
V. A. D.
,
Berton
,
C.
,
Zuberbühler
,
K.
, &
Bickel
,
B.
(
2024
).
A universal preference for animate agents in hominids
.
iScience
,
27
(
6
),
109996
. ,
[PubMed]
Bürkner
,
P.-C.
(
2017
).
brms: An R package for bayesian multilevel models using Stan
.
Journal of Statistical Software
,
80
(
1
),
1
28
.
Bürkner
,
P.-C.
(
2018
).
Advanced bayesian multilevel modeling with the R package brms
.
R Journal
,
10
(
1
),
395
411
.
Bürkner
,
P.-C.
(
2021
).
Bayesian item response modeling in R with brms and Stan
.
Journal of Statistical Software
,
100
(
5
),
1
54
.
Cacchione
,
T.
, &
Krist
,
H.
(
2004
).
Recognizing impossible object relations: Intuitions about support in chimpanzees (Pan troglodytes)
.
Journal of Comparative Psychology
,
118
(
2
),
140
148
. ,
[PubMed]
Carpenter
,
B.
,
Gelman
,
A.
,
Hoffman
,
M. D.
,
Lee
,
D.
,
Goodrich
,
B.
,
Betancourt
,
M.
,
Brubaker
,
M. A.
,
Guo
,
J.
,
Li
,
P.
, &
Riddell
,
A.
(
2017
).
Stan: A probabilistic programming language
.
Journal of Statistical Software
,
76
(
1
). ,
[PubMed]
Caselli
,
L.
, &
Chelazzi
,
L.
(
2011
).
Does the macaque monkey provide a good model for studying human executive control? A comparative behavioral study of task switching
.
PLoS ONE
,
6
(
6
),
e21489
. ,
[PubMed]
Castro
,
L.
, &
Wasserman
,
E. A.
(
2016
).
Executive control and task switching in pigeons
.
Cognition
,
146
,
121
135
. ,
[PubMed]
Cheney
,
D. L.
,
Silk
,
J. B.
, &
Seyfarth
,
R. M.
(
2016
).
Network connections, dyadic bonds and fitness in wild female baboons
.
Royal Society Open Science
,
3
(
7
),
160255
. ,
[PubMed]
Clay
,
Z.
,
Ravaux
,
L.
,
de Waal
,
F. B. M.
, &
Zuberbühler
,
K.
(
2016
).
Bonobos (Pan paniscus) vocally protest against violations of social expectations
.
Journal of Comparative Psychology
,
130
(
1
),
44
54
. ,
[PubMed]
Deen
,
B.
,
Schwiedrzik
,
C. M.
,
Sliwa
,
J.
, &
Freiwald
,
W. A.
(
2023
).
Specialized networks for social cognition in the primate brain
.
Annual Review of Neuroscience
,
46
,
381
401
. ,
[PubMed]
Dobel
,
C.
,
Gumnior
,
H.
,
Bölte
,
J.
, &
Zwitserlood
,
P.
(
2007
).
Describing scenes hardly seen
.
Acta Psychologica
,
125
(
2
),
129
143
. ,
[PubMed]
Dowty
,
D.
(
1991
).
Thematic proto-roles and argument selection
.
Language
,
67
(
3
),
547
619
.
Freyd
,
J. J.
(
1983
).
The mental representation of movement when static stimuli are viewed
.
Perception & Psychophysics
,
33
(
6
),
575
581
. ,
[PubMed]
Galazka
,
M.
, &
Nyström
,
P.
(
2016
).
Infants’ preference for individual agents within chasing interactions
.
Journal of Experimental Child Psychology
,
147
,
53
70
. ,
[PubMed]
Gao
,
T.
,
Newman
,
G. E.
, &
Scholl
,
B. J.
(
2009
).
The psychophysics of chasing: A case study in the perception of animacy
.
Cognitive Psychology
,
59
(
2
),
154
179
. ,
[PubMed]
Gardner
,
R. A.
, &
Gardner
,
B. T.
(
1984
).
A vocabulary test for chimpanzees (Pan troglodytes)
.
Journal of Comparative Psychology
,
98
(
4
),
381
404
. ,
[PubMed]
Goupil
,
N.
,
Papeo
,
L.
, &
Hochmann
,
J.-R.
(
2022
).
Visual perception grounding of social cognition in preverbal infants
.
Infancy
,
27
(
2
),
210
231
. ,
[PubMed]
Guterstam
,
A.
, &
Graziano
,
M. S. A.
(
2020
).
Implied motion as a possible mechanism for encoding other people’s attention
.
Progress in Neurobiology
,
190
,
101797
. ,
[PubMed]
Hafri
,
A.
,
Papafragou
,
A.
, &
Trueswell
,
J. C.
(
2013
).
Getting the gist of events: Recognition of two-participant actions from brief displays
.
Journal of Experimental Psychology: General
,
142
(
3
),
880
905
. ,
[PubMed]
Hafri
,
A.
,
Trueswell
,
J. C.
, &
Strickland
,
B.
(
2018
).
Encoding of event roles from visual scenes is rapid, spontaneous, and interacts with higher-level visual processing
.
Cognition
,
175
,
36
52
. ,
[PubMed]
Hamlin
,
J. K.
,
Wynn
,
K.
, &
Bloom
,
P.
(
2007
).
Social evaluation by preverbal infants
.
Nature
,
450
(
7169
),
557
559
. ,
[PubMed]
Hanus
,
D.
(
2009
).
Great apes’ causal cognition in the physical domain
.
Universität zu Berlin
.
Hayes
,
K. J.
, &
Hayes
,
C.
(
1953
).
Picture perception in a home-raised chimpanzee
.
Journal of Comparative and Physiological Psychology
,
46
(
6
),
470
474
. ,
[PubMed]
Isasi-Isasmendi
,
A.
,
Andrews
,
C.
,
Flecken
,
M.
,
Laka
,
I.
,
Daum
,
M. M.
,
Meyer
,
M.
,
Bickel
,
B.
, &
Sauppe
,
S.
(
2023
).
The agent preference in visual event apprehension
.
Open Mind
,
7
,
240
282
. ,
[PubMed]
Jerslid
,
A. T.
(
1927
).
Mental set and shift: Vol. whole no. 89
.
Archives of Psychology
.
Kourtzi
,
Z.
, &
Kanwisher
,
N.
(
2000
).
Activation in human MT/MST by static images with implied motion
.
Journal of Cognitive Neuroscience
,
12
(
1
),
48
55
. ,
[PubMed]
Krupenye
,
C.
, &
Hare
,
B.
(
2018
).
Bonobos prefer individuals that hinder others over those that help
.
Current Biology
,
28
(
2
),
280
286
. ,
[PubMed]
Kyes
,
R. C.
, &
Candland
,
D. K.
(
1987
).
Baboon (Papio hamadryas) visual preferences for regions of the face
.
Journal of Comparative Psychology
,
101
(
4
),
345
348
. ,
[PubMed]
Kyes
,
R. C.
,
Elizabeth Mayer
,
K.
, &
Bunnell
,
B. N.
(
1992
).
Perception of stimuli presented as photographic slides in cynomolgus macaques (Macaca fascicularis)
.
Primates
,
33
(
3
),
407
412
.
Lenth
,
R.
,
Singmann
,
H.
,
Love
,
J.
,
Buerkner
,
P.
, &
Herve
,
M.
(
2018
).
Package “emmeans”
. R Package Version 4.0-3. https://cran.r-project.org/package=emmeans.
Leslie
,
A. M.
(
1982
).
The perception of causality in infants
.
Perception
,
11
(
2
),
173
186
. ,
[PubMed]
Leslie
,
A. M.
, &
Keeble
,
S.
(
1987
).
Do six-month-old infants perceive causality?
Cognition
,
25
(
3
),
265
288
. ,
[PubMed]
Mascalzoni
,
E.
,
Regolin
,
L.
, &
Vallortigara
,
G.
(
2010
).
Innate sensitivity for self-propelled causal agency in newly hatched chicks
.
Proceedings of the National Academy of Sciences
,
107
(
9
),
4483
4485
. ,
[PubMed]
Mascalzoni
,
E.
,
Regolin
,
L.
,
Vallortigara
,
G.
, &
Simion
,
F.
(
2013
).
The cradle of causal reasoning: Newborns’ preference for physical causality
.
Developmental Science
,
16
(
3
),
327
335
. ,
[PubMed]
McFarland
,
R.
,
Roebuck
,
H.
,
Yan
,
Y.
,
Majolo
,
B.
,
Li
,
W.
, &
Guo
,
K.
(
2013
).
Social interactions through the eyes of macaques and humans
.
PLoS ONE
,
8
(
2
),
e56437
. ,
[PubMed]
McMahon
,
E.
, &
Isik
,
L.
(
2023
).
Seeing social interactions
.
Trends in Cognitive Sciences
,
27
(
12
),
1165
1179
. ,
[PubMed]
Meier
,
C.
,
Lea
,
S. E. G.
,
Forrest
,
C. L. D.
,
Angerer
,
K.
, &
McLaren
,
I. P. L.
(
2013
).
Comparative evidence for associative learning in task switching
. 35th Annual Conference of the Cognitive Science Society, Austin, TX.
Meier
,
C.
,
Lea
,
S. E. G.
, &
McLaren
,
I. P. L.
(
2016
).
Task-switching in pigeons: Associative learning or executive control?
Journal of Experimental Psychology: Animal Learning and Cognition
,
42
(
2
),
163
176
. ,
[PubMed]
Michotte
,
A.
(
1963
).
Perception of causality
(Basic Books).
Basic Books
.
Morris
,
M. W.
, &
Peng
,
K.
(
1994
).
Culture and cause: American and Chinese attributions for social and physical events
.
Joumal of Personality and Social Psychology
,
67
(
6
),
949
971
.
O’Connell
,
S.
, &
Dunbar
,
R. I. M.
(
2005
).
The perception of causality in chimpanzees (Pan spp.)
.
Animal Cognition
,
8
(
1
),
60
66
. ,
[PubMed]
O’Donoghue
,
E. M.
,
Broschard
,
M. B.
, &
Wasserman
,
E. A.
(
2020
).
Pigeons exhibit flexibility but not rule formation in dimensional learning, stimulus generalization, and task switching
.
Journal of Experimental Psychology: Animal Learning and Cognition
,
46
(
2
),
107
123
. ,
[PubMed]
O’Donoghue
,
E.
, &
Wasserman
,
E. A.
(
2021
).
Pigeons proficiently switch among four tasks without cost
.
Journal of Experimental Psychology: Animal Learning and Cognition
,
47
(
2
),
150
162
. ,
[PubMed]
Oosterwijk
,
S.
,
Winkielman
,
P.
,
Pecher
,
D.
,
Zeelenberg
,
R.
,
Rotteveel
,
M.
, &
Fischer
,
A. H.
(
2012
).
Mental states inside out: Switching costs for emotional and nonemotional sentences that differ in internal and external focus
.
Memory & Cognition
,
40
(
1
),
93
100
. ,
[PubMed]
Papeo
,
L.
(
2020
).
Twos in human visual perception
.
Cortex
,
132
,
473
478
. ,
[PubMed]
Papeo
,
L.
, &
Abassi
,
E.
(
2019
).
Seeing social events: The visual specialization for dyadic human–human interactions
.
Journal of Experimental Psychology: Human Perception and Performance
,
45
(
7
),
877
888
. ,
[PubMed]
Papeo
,
L.
,
Goupil
,
N.
, &
Soto-Faraco
,
S.
(
2019
).
Visual search for people among people
.
Psychological Science
,
30
(
10
),
1483
1496
. ,
[PubMed]
Papeo
,
L.
,
Stein
,
T.
, &
Soto-Faraco
,
S.
(
2017
).
The two-body inversion effect
.
Psychological Science
,
28
(
3
),
369
379
. ,
[PubMed]
Papeo
,
L.
,
Vettori
,
S.
,
Serraille
,
E.
,
Odin
,
C.
,
Rostami
,
F.
, &
Hochmann
,
J.-R.
(
2024
).
Abstract thematic roles in infants’ representation of social events
.
Current Biology
,
34
(
18
),
4294
4300
. ,
[PubMed]
Pecher
,
D.
,
Zeelenberg
,
R.
, &
Barsalou
,
L. W.
(
2003
).
Verifying different-modality properties for concepts produces switching costs
.
Psychological Science
,
14
(
2
),
119
124
. ,
[PubMed]
Pelli
,
D. G.
(
1997
).
The VideoToolbox software for visual psychophysics: Transforming numbers into movies
.
Spatial Vision
,
10
(
4
),
437
442
. ,
[PubMed]
Pene
,
C. H. M.
,
Muramatsu
,
A.
, &
Matsuzawa
,
T.
(
2020
).
Color discrimination and color preferences in Chimpanzees (Pan troglodytes)
.
Primates
,
61
(
3
),
403
413
. ,
[PubMed]
R Core Team
. (
2021
).
R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
. https://www.R-project.org/.
Rimé
,
B.
,
Boulanger
,
B.
,
Laubin
,
P.
,
Richir
,
M.
, &
Stroobants
,
K.
(
1985
).
The perception of interpersonal emotions originated by patterns of movement
.
Motivation and Emotion
,
9
(
3
),
241
260
.
Rissman
,
L.
, &
Majid
,
A.
(
2019
).
Thematic roles: Core knowledge or linguistic construct?
Psychonomic Bulletin & Review
,
26
(
6
),
1850
1869
. ,
[PubMed]
Robertson
,
S. S.
, &
Suci
,
G. J.
(
1980
).
Event perception by children in the early stages of language production
.
Child Development
,
51
(
1
),
89
96
. ,
[PubMed]
Rogers
,
R. D.
, &
Monsell
,
S.
(
1995
).
Costs of a predictible switch between simple cognitive tasks
.
Journal of Experimental Psychology: General
,
124
(
2
),
207
231
.
Rosenfeld
,
S. A.
, &
Van Hoesen
,
G. W.
(
1979
).
Face recognition in the rhesus monkey
.
Neuropsychologia
,
17
(
5
),
503
509
. ,
[PubMed]
Savage-Rumbaugh
,
E. S.
,
Rumbaugh
,
D. M.
,
Smith
,
S. T.
, &
Lawson
,
J.
(
1980
).
Reference: The linguistic essential
.
Science
,
210
(
4472
),
922
925
.
Scholl
,
B. J.
, &
Tremoulet
,
P. D.
(
2000
).
Perceptual causality and animacy
.
Trends in Cognitive Sciences
,
4
(
8
),
299
309
. ,
[PubMed]
Shirai
,
N.
, &
Imura
,
T.
(
2016
).
Emergence of the ability to perceive dynamic events from still pictures in human infants
.
Scientific Reports
,
6
(
1
),
37206
. ,
[PubMed]
Silk
,
J. B.
(
2007
).
Social components of fitness in primate groups
.
Science
,
317
(
5843
),
1347
1351
. ,
[PubMed]
Skripkauskaite
,
S.
,
Mihai
,
I.
, &
Koldewyn
,
K.
(
2023
).
Attentional bias towards social interactions during viewing of naturalistic scenes
.
Quarterly Journal of Experimental Psychology
,
76
(
10
),
2303
2311
. ,
[PubMed]
Slocombe
,
K. E.
,
Kaller
,
T.
,
Call
,
J.
, &
Zuberbühler
,
K.
(
2010
).
Chimpanzees extract social information from agonistic screams
.
PLoS ONE
,
5
(
7
),
e11473
. ,
[PubMed]
Slocombe
,
K. E.
, &
Zuberbühler
,
K.
(
2005
).
Agonistic screams in wild chimpanzees (Pan troglodytes schweinfurthii) vary as a function of social role
.
Journal of Comparative Psychology
,
119
(
1
),
67
77
. ,
[PubMed]
Smith
,
T. R.
, &
Beran
,
M. J.
(
2018
).
Task switching in rhesus macaques (Macaca mulatta) and tufted capuchin monkeys (Cebus apella) during computerized categorization tasks
.
Journal of Experimental Psychology: Animal Learning and Cognition
,
44
(
3
),
229
246
. ,
[PubMed]
Spence
,
C.
,
Nicholls
,
M. E. R.
, &
Driver
,
J.
(
2001
).
The cost of expecting events in the wrong sensory modality
.
Perception & Psychophysics
,
63
(
2
),
330
336
. ,
[PubMed]
Stoet
,
G.
, &
Snyder
,
L. H.
(
2007
).
Extensive practice does not eliminate human switch costs
.
Cognitive, Affective, & Behavioral Neuroscience
,
7
(
3
),
192
197
. ,
[PubMed]
Su
,
J.
,
van Boxtel
,
J. J. A.
, &
Lu
,
H.
(
2016
).
Social interactions receive priority to conscious perception
.
PLOS ONE
,
11
(
8
),
e0160468
. ,
[PubMed]
Vehtari
,
A.
,
Gelman
,
A.
, &
Gabry
,
J.
(
2017
).
Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC
.
Statistics and Computing
,
27
(
5
),
1413
1432
.
Vestner
,
T.
,
Gray
,
K. L. H.
, &
Cook
,
R.
(
2020
).
Why are social interactions found quickly in visual search tasks?
Cognition
,
200
,
104270
. ,
[PubMed]
Vestner
,
T.
,
Tipper
,
S. P.
,
Hartley
,
T.
,
Over
,
H.
, &
Rueschemeyer
,
S.-A.
(
2019
).
Bound together: Social binding leads to faster processing, spatial distortion, and enhanced memory of interacting partners
.
Journal of Experimental Psychology: General
,
148
(
7
),
1251
1268
. ,
[PubMed]
Vettori
,
S.
,
Odin
,
C.
,
Hochmann
,
J.-R.
, &
Papeo
,
L.
(
2025
).
A perceptual cue-based mechanism for automatic assignment of thematic agent and patient roles
.
Journal of Experimental Psychology: General
,
154
(
3
),
787
798
. ,
[PubMed]
Wells
,
D. L.
,
McDonald
,
C. L.
, &
Ringland
,
J. E.
(
2008
).
Color preferences in gorillas (Gorilla gorilla gorilla) and chimpanzees (Pan troglodytes)
.
Journal of Comparative Psychology
,
122
(
2
),
213
219
. ,
[PubMed]
White
,
P. A.
, &
Milne
,
A.
(
1997
).
Phenomenal causality: Impressions of pulling in the visual perception of objects in motion
.
American Journal of Psychology
,
110
(
4
),
573
602
. ,
[PubMed]
Wilson
,
V. A. D.
,
Sauppe
,
S.
,
Brocard
,
S.
,
Ringen
,
E.
,
Daum
,
M. M.
,
Wermelinger
,
S.
,
Gu
,
N.
,
Andrews
,
C.
,
Isasi-Isasmendi
,
A.
,
Bickel
,
B.
, &
Zuberbühler
,
K.
(
2024
).
Humans and great apes visually track event roles in similar ways
.
PLOS Biology
,
22
(
11
),
e3002857
. ,
[PubMed]
Wilson
,
V. A. D.
,
Zuberbühler
,
K.
, &
Bickel
,
B.
(
2022
).
The evolutionary origins of syntax: Event cognition in nonhuman primates
.
Science Advances
,
8
(
25
),
eabn8464
. ,
[PubMed]
Wiseheart
,
M.
,
Viswanathan
,
M.
, &
Bialystok
,
E.
(
2016
).
Flexibility in task switching by monolinguals and bilinguals
.
Bilingualism: Language and Cognition
,
19
(
1
),
141
146
. ,
[PubMed]
Yao
,
Y.
,
Vehtari
,
A.
,
Simpson
,
D.
, &
Gelman
,
A.
(
2018
).
Using stacking to average bayesian predictive distributions (with Discussion)
.
Bayesian Analysis
,
13
(
3
),
917
1007
.
Zanon
,
M.
,
Lemaire
,
B. S.
,
Papeo
,
L.
, &
Vallortigara
,
G.
(
2024
).
Innate sensitivity to face-to-face biological motion
.
iScience
,
27
(
2
),
108793
. ,
[PubMed]
Zuberbühler
,
K.
(
2019
).
Evolutionary roads to syntax
.
Animal Behaviour
,
151
,
259
265
.
Zuberbühler
,
K.
(
2022
).
Event parsing and the origins of grammar
.
Wiley Interdisciplinary Reviews: Cognitive Science
,
13
(
3
),
e1587
. ,
[PubMed]
Zuberbühler
,
K.
, &
Bickel
,
B.
(
2022
).
Transition to language: From agent perception to event representation
.
Wiley Interdisciplinary Reviews: Cognitive Science
,
13
(
6
),
e1594
. ,
[PubMed]

Competing Interests

Competing Interests: The authors declare that no conflict of interests.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.

Supplementary data