The human turn-taking system regulates the smooth and precise exchange of speaking turns during face-to-face interaction. Recent studies investigated the processing of ongoing turns during conversation by measuring the eye movements of noninvolved observers. The findings suggest that humans shift their gaze in anticipation to the next speaker before the start of the next turn. Moreover, there is evidence that the ability to timely detect turn transitions mainly relies on the lexico-syntactic content provided by the conversation. Consequently, patients with aphasia, who often experience deficits in both semantic and syntactic processing, might encounter difficulties to detect and timely shift their gaze at turn transitions. To test this assumption, we presented video vignettes of natural conversations to aphasic patients and healthy controls, while their eye movements were measured. The frequency and latency of event-related gaze shifts, with respect to the end of the current turn in the videos, were compared between the two groups. Our results suggest that, compared with healthy controls, aphasic patients have a reduced probability to shift their gaze at turn transitions but do not show significantly increased gaze shift latencies. In healthy controls, but not in aphasic patients, the probability to shift the gaze at turn transition was increased when the video content of the current turn had a higher lexico-syntactic complexity. Furthermore, the results from voxel-based lesion symptom mapping indicate that the association between lexico-syntactic complexity and gaze shift latency in aphasic patients is predicted by brain lesions located in the posterior branch of the left arcuate fasciculus. Higher lexico-syntactic processing demands seem to lead to a reduced gaze shift probability in aphasic patients. This finding may represent missed opportunities for patients to place their contributions during everyday conversation.
The turn-taking system can be referred to as a speech exchange system, which organizes the opportunities to speak during social interaction. Sacks, Schegloff, and Jefferson (1974) suggested that we are following a basic set of rules that are governing turn construction during conversation. For example, either the current speaker has the option to actively pass the turn to the next speaker (speaker's selection) or the turn can be taken by the listener at the next possible completion (self-selection). Following these basic rules ensures that there is only one speaker at a time. Apparently, self-selection requires that the listener is able to project the end of the turn. According to Sacks et al. (1974), this ability relies on our knowledge of the structure of the linguistic units, which enables us to project their ending in advance. As a consequence, this further allows us to project the end of a turn. This means that we recognize familiar linguistic units of a turn, and we are thus capable to project where the turn will end. At this point, one could ask how the turn-taking system, and with it turn projection, might be affected by a general disorder of language processing like aphasia. We approached this question by assessing eye movements from the perspective of noninvolved observers to evaluate the timing of gaze shifting at event-correlated turn transitions.
Aphasia is an acquired language disorder and is a common consequence of brain damage to the language-dominant hemisphere. The patients' impairments typically encompass both verbal production and verbal comprehension deficits, which may alter conversational skills (Damasio, 1992). Nevertheless, previous research suggests that the fundamental communicative competence for effective turn-taking seems to be preserved in aphasic patients (Ulatowska, Allard, Reyes, Ford, & Chapman, 1992; Holland, 1982; Prinz, 1980; Schienberg & Holland, 1980). For instance, Schienberg and Holland (Holland, 1982; Schienberg & Holland, 1980) reported, from the analysis of the conversation between two aphasic patients, that turn-taking behavior remained intact. Aphasic patients even showed repair strategies for turn-taking errors when both speakers were talking at the same time. The authors suggested that a naive observer who does not speak the language of the two patients would not even notice their language production deficits. Even if turn-taking behavior per se seems to be preserved, processing of linguistic information that has been shown to be crucial for the detection of turn transitions seems to be impaired in aphasic patients. De Ruiter, Mitterer, and Enfield (2006) presented audio recordings from telephone conversations, which contained isolated turns. They found that healthy participants could reliably indicate the expected end of a turn before it was completed. The authors further reported that this ability depended on the availability of lexico-syntactic information. The intonational contour itself was not a sufficiently strong cue to anticipate the end of a turn. Consequently, aphasic patients who often show deficits in semantic processing and/or syntactical processing (Caplan, Waters, Dede, Michaud, & Reddy, 2007; Jefferies & Ralph, 2006; Caramazza & Berndt, 1978) should also have greater difficulties to detect the linguistic units necessary to project the end of the turn.
Eye tracking has recently become a well-established technique to study the real-time processing of ongoing turns in noninvolved observers (Holler & Kendrick, 2015) but, to the best of our knowledge, has not been applied to aphasic patients. In this type of experimental paradigm, participants are requested to watch prerecorded videos of dialogs, while their eye movements are recorded. The subsequent analysis focuses on the timing of participants' gaze shifts in relation to the turn transitions between speaking actors in the video. Studies using this paradigm consistently showed that participants track the current speaker with their eye gaze (Keitel & Daum, 2015; Preisig et al., 2015; Hirvenkari et al., 2013; Keitel, Prinz, Friederici, von Hofsten, & Daum, 2013; von Hofsten, Uhlig, Adell, & Kochukhova, 2009). Previous research suggests that the planning and execution of a saccadic gaze shifts require 200 msec (Griffin & Bock, 2000; Becker, 1991; Salthouse & Ellis, 1980; Westheimer, 1954). Hence, gaze shift reactions that occur within the first 200 msec after a turn is completed have been planned a priori and can thus be considered as indicators for turn-end projection. Keitel et al. (2013) and Keitel and Daum (2015) found that healthy individuals shift their gaze on the majority of turn transition in a time window spanning from 500 msec before the end of the current turn to the beginning of the next turn in the video. In contrast, Hirvenkari et al. (2013), who also presented prerecorded video stimuli, did not find evidence for turn-related anticipatory gaze shifts in healthy participants. Holler and Kendrick (2015) explained the conflicting results by discrepancies in the stimulus properties, such as the different degree of spontaneity of the conversational exchange. Indeed, Hirvenkari et al. (2013) analyzed only fast turn transitions in which the speaker change (from the end of one speaker's speech to the start of the other's) occurred within less than 300 msec. In contrast, turn transitions in the video material used by Keitel and colleagues lasted on average between 860 and 930 msec. This comparison suggests that anticipation of the next turn might be modulated by the duration of the interspeaker gap. To exclude the impact of the interspeaker gap duration, we decided to study turn projection as defined with respect to the end of the current turn.
In this study, we addressed two main aspects of turn processing during video observation in aphasic patients: the detection of turn transitions and the timing of transition-related gaze shifts. We analyzed the frequency of turn transition-related gaze shifts as a measure of transition detection and compared it with the gaze shift frequency on events without transition between speakers (i.e., pauses within a speaker's utterance or within-speaker overlaps). Pauses and within-speaker overlaps also have the potential to indicate a turn transition to the observer. We hypothesized that if aphasic patients have difficulties to detect turn transitions per se, then they would also show less turn transition-related gaze shifts. As a timing estimate, gaze shift latencies were calculated relative to the end of the current turn. In contrast to other studies, our video material was relatively fast paced and not scripted. This led us to the expectation that the majority of gaze shifts would follow turn transitions, rather than precede them. Other than in previous studies, our stimulus material also included turn transitions with overlapping speech. Note that, at turn transitions with overlapping speech, the next turn begins before the current turn ends. We distinguished between transitions with interspeaker overlap and transitions with interspeaker gap. Transitions with overlapping speech might be more difficult to project, because they happen suddenly, that is, when the current speaker is interrupted by the next speaker (self-selection). Moreover, transitions with overlapping speech may represent a more ambiguous situation, in which it has to be resolved who is taking the next turn (Schegloff, 2000). The lexico-syntactic context helps healthy participants to reliably detect upcoming turn transitions (Magyari & De Ruiter, 2012; De Ruiter et al., 2006). Moreover, a current model of turn-taking assumes that humans rely on the linguistic content to make predictions about the unfolding of the current turn (Pickering & Garrod, 2013). However, content-rich sentences with increasing levels of lexico-syntactic complexity also impose higher processing demands on aphasic patients who are impaired in syntactic and/or semantic processing (Caplan et al., 2007; Jefferies & Ralph, 2006). Therefore, we expected that the advantage given by additional lexico-syntactic information would be only limited in aphasic patients. This led us to the hypothesis that higher lexico-syntactic complexity would be related either to a reduced detection of turn transitions or to increased gaze shift latencies in aphasic patients. In a recent study, Keitel and Daum (2015) reported an additional gain of available intonation, as reflected in shorter turn transition-related gaze shift latencies, in healthy participants. For this reason, we assessed whether the variance within the intonation curve, similar as the availability of intonation per se, would have an impact on turn processing. A higher gain triggered by the availability of video intonation would indicate a compensation of lexico-syntactic processing deficits in aphasic patients. Furthermore, we aimed to identify lesion sites associated with turn processing in aphasic patients, applying voxel-based lesion symptom mapping (VLSM). VLSM is a method that allows to study the direct relationship between tissue damage and behavior, on a voxel-by-voxel basis, comparable with functional neuroimaging (Bates et al., 2003).
Sixteen aphasic patients with first-ever left-hemispheric stroke (mean age = 52.6 years, SD = 13.3 years; five women, one left-handed) and 23 healthy controls (mean age = 50.3 years, SD = 16.4 years; eight women; one left-handed, one ambidexter) were included in the study. The present analysis is based on data that have been documented in a previous publication (Preisig et al., 2015). All participants had normal or corrected-to-normal visual acuity and an intact central visual field of 30°. At examination, patients were in a subacute to chronic condition (mean months poststroke = 14.9, SD = 16.3). Aphasia diagnosis was based on standardized language assessment performed by clinical speech-language therapists. Aphasia severity was assessed by means of two subtests of the Aachener Aphasia Test (Huber, Poeck, & Willmes, 1984), namely, the Token Test and the Written Language. Previous research demonstrated that the discriminative validity of these two subtests in conjunction is as good as the one of the whole test battery (Willmes, Poeck, Weniger, & Huber, 1980). Before study participation, written informed consent was obtained from all participants. The study was approved by the local ethics committees of the State of Bern and the State of Luzern and was conducted according to the latest version of the Declaration of Helsinki.
Participants were seated in front of an SMI 250-Hz infrared eye-tracker (RED; SensoMotoric Instruments GmbH, Teltow, Germany), at a distance between 60 and 80 cm. After being seated, participants were instructed to attentively watch the presented videos. Before the main procedure, participants could familiarize themselves with the experimental setting during a practice run. The four videos depicted dyadic conversations between a female and a male actor. The videos were presented on a 22-in. computer screen, and the actors in the video sequences covered a visual angle of approximately 16°. Each video lasted 2 min. The order of presentation was randomized. The experimental procedure lasted between 20 and 30 min. Further details concerning the video sequences and the procedure are provided in our recent report (Preisig et al., 2015).
Analysis of the Video Data
First, orthographic transcriptions of the video stimuli were time-aligned with the speech signal from the corresponding video audio file using the Web service provided by the Bavarian Archive for Speech Signals (Kisler, Schiel, & Sloetjes, 2012; Schiel, 1999). The resulting TextGrid contained a time-aligned word segmentation of the speech signal. This TextGrid was then imported into the linguistic annotation software ELAN (Wittenburg, Brugman, Russel, Klassmann, & Sloetjes, 2006), where the time alignment of the transcript was verified and manually adjusted if necessary.
For the transcribed video data, we defined the events that represented potential turn transition signals for the observer and that could thus provoke a gaze shift away from the current speaker. According to Heldner and Edlund (2010), four event categories were defined: (1) overlap between speaker turns (interspeaker overlap), (2) period of silence between speaker turns (interspeaker gap), (3) period of silence within a speaker's utterance (pause), and (4) overlap within a speaker's utterance (within-speaker overlap; see Figure 1A). Although only interspeaker overlaps and interspeaker gaps represent events with a turn transition, pauses and within-speaker overlaps can also elicit gaze shifts away from the current speaker (as indicated in Figure 1B). The reason is that pauses and within-speaker overlaps can create an ambiguous situation where it is not clear for the observer who will take the next turn. The point of turn transition (i.e., turn relevance place) was set at the beginning of interspeaker gaps and interspeaker overlaps. For an overview of the video details (e.g., number and mean duration of events), see Table 1.
|.||Video Duration (sec) .||Interspeaker Overlap .||Interspeaker Gap .||Pause .||Within-speaker Overlap .|
|Number .||Ø Duration .||SD .||Number .||Ø Duration .||SD .||Number .||Ø Duration .||SD .||Number .||Ø Duration .||SD .|
|.||Video Duration (sec) .||Interspeaker Overlap .||Interspeaker Gap .||Pause .||Within-speaker Overlap .|
|Number .||Ø Duration .||SD .||Number .||Ø Duration .||SD .||Number .||Ø Duration .||SD .||Number .||Ø Duration .||SD .|
Ø Duration = mean duration in milliseconds.
To assess the content of each interevent time interval (IETI), we calculated separate indices taking into account lexico-syntactic complexity and intonation. To ensure that enough lexico-syntactic and intonational information was provided during each IETI, events were included in the analysis only if they were preceded by an IETI that contained at least six words.
The lexico-syntactic complexity index was calculated as compound index, considering both the number and the median lexical frequency of the words during each IETI. We included separate measures for syntactic and lexical complexity because both properties can impose higher processing demands for patients with aphasia and thus may affect their predictions of turn transitions. The number of words per IETI was taken as a measure of the syntactic load (Lu, 2011; Larsen-Freeman, 1978). Higher syntactic load requires higher phonological STM capacities. Baldo and Dronkers (2006) found that aphasic patients show impairments in phonological STM. The median lexical frequency was adopted as an indicator of lexical complexity, because more common words are usually correctly perceived at much lower speech-to-noise ratios, a phenomenon referred to as the word frequency effect (Savin, 1963; Schuell, Jenkins, & Landis, 1961). Moreover, word frequency also affects lexical retrieval in aphasic patients (Luzzatti et al., 2002). The lexical frequency of the words within each IETI was calculated using WordGen (Duyck, Desmet, Verbeke, & Brysbaert, 2004). To build a compound index for lexico-syntactic complexity, lexical word frequency and syntactical complexity were combined using Stouffer's z score method (Stouffer et al., 1949). Using this method, z transformation was applied to the median word frequency and to the number of words for each IETI. Please note that a lower word frequency corresponds to higher lexical complexity and a higher number of words correspond to higher syntactical complexity. Thus, the resulting z scores were combined for each IETI by subtracting the z-standardized median word frequency from the z-standardized number of words. Subtraction of the z scores takes into account that the combined values run into opposite directions. The values of the lexico-syntactic complexity index were also log-transformed with the natural logarithm, because their distribution was skewed.
As a measure of intonation, we considered the change in the intonation curve toward the end of the IETI. When a speaker's turn is coming to an end, this can be indicated by a falling intonation or by a rising intonation when asking a question (Bögels & Torreira, 2015; Gravano & Hirschberg, 2011; Duncan, 1972). Therefore, we decided to take the variance in the intonation curve during the last six words of the IETI as prosodic turn signal. For this purpose, the base frequency (f0) of the video sound files in Hertz was extracted using the Praat software (Boersma & Weenink, 2001). Then, the variance within f0 was calculated over the last six words of each IETI.
Analysis of Eye Movement Data
Saccadic data were extracted from the SMI analysis software (BeGaze; SensoMotoric Instruments GmbH, Teltow, Germany). Only direct gaze shifts between the face regions of the two actors in the video were included in the analysis, that is, saccades that started on the face region of one actor and ended on the face region of the other actor.
Event-related gaze shifts were selected for the analysis by means of a crucial time window. Every saccadic gaze shift occurring in a time window ranging from 1000 msec before to 1000 msec after an event was included in the analysis. Events were considered for analysis only if the preceding and subsequent IETIs lasted at least 1000 msec. The aim of this procedure was to prevent that the crucial time window of one given event would overlap with the one of another event occurring right beforehand or afterward.
Furthermore, we only included in the analysis (a) gaze shifts in the direction of the corresponding turn transition, that is, from the current to the next speaker and, (b) in case of an event without transition (pauses and within-speaker overlaps), gaze shifts leading away from the current speaker. Thus, random gaze shifts directed from the listener to the speaker were not included.
Two dependent variables were computed for every event that was included in the analysis: the binomial variable gaze shift reaction (0 = no gaze shift, 1 = gaze shift) and the continuous variable gaze shift latency in milliseconds. If a participant produced multiple gaze shifts within the crucial time window of a single event, only the first gaze shift was considered as relevant for the analysis. The gaze shift latency was calculated by subtracting the starting time of the saccade from the starting time of the corresponding event. Thus, a negative value indicates that the starting time of the saccade preceded the starting time of the event, and vice versa. The average gaze shift frequency per participant was calculated as the ratio: number of gaze shifts per event category divided by the number of events per category.
Statistical analyses were conducted with IBM Statistics SPSS 21 and lme4 (Bates, Mächler, Bolker, & Walker, 2014), a package implemented in the open-source program R (R Core Team, 2014). Two separate repeated-measures ANOVAs were calculated for the dependent variables average gaze shift frequency and average gaze shift latency. For post hoc comparisons, pairwise t tests with Holm correction were calculated. Partial η2 was computed as an estimate of effect size.
To take into account variables that unfolded during the course of our experiment (such as lexico-syntactic complexity and intonation variance during the IETI), we applied mixed effect modeling using the lme4 package. A key advantage of mixed effects models is that they do not require prior averaging (Baayen, Davidson, & Bates, 2008), because each participant has its own intercept, which randomly deviates from the mean intercept. Therefore, individual gaze shift reactions, which occurred in relation to different events, can be directly entered into the model. A generalized linear mixed model (GLMM) for binomial data was calculated for the dependent variable gaze shift reaction (0 = no gaze shift, 1 = gaze shift) using the glmer function. The glmer function provides p values for the fixed effects in the model based on asymptotic Wald tests. Least-square means were computed for post hoc comparisons in the GLMM. Furthermore, a linear mixed model (LMM) was calculated for the continuous variable gaze shift latency, applying the lmer function. For this model, the analyzed data were unbalanced, because participants shifted their gaze in relation to different events in the videos. Therefore, the lmer function cannot apply simple formulas to estimate the degrees of freedom. For this reason, in such cases, the lmer function provides only a list of t values, but no p values. However, when the number of participants and the number of observations are sufficiently large, there is a strong correspondence between the t statistics and the z statistics. In this case, t values larger than ±2 can be considered as significant (Ohl, Brandt, & Kliegl, 2011; Baayen et al., 2008).
Lesion analysis of imaging data was conducted using the open source software MRICron (Rorden, Karnath, & Bonilha, 2007). The brain lesions of 11 patients with available MRI scans (VOI) were delineated directly onto the transversal slices of the individual T2-weighted MRI scans. The MRI scan of each patient, including the lesion VOI, was then normalized into the Talairach space using the spatial normalization algorithm provided by SPM5 (www.fil.ion.ucl.ac.uk/spm/). The brain lesions of the five remaining patients, with an available CT scan, were mapped directly onto the CH2 template brain implemented in MRICron (Rorden & Brett, 2000). To relate behavioral measures to neuroanatomy, conventional lesion subtraction and VLSM analyses were conducted. For the lesion subtraction analysis, which only provides descriptive outcomes, lesion VOIs of patients who showed positive correlations between gaze shift latencies with lexico-syntactic complexity and intonation variance were contrasted with lesion VOIs of patients who showed the opposite pattern. MRICron offers two VLSM methods: the nonparametric Liebermeister test for binomial data and t tests for continuous behavioral data. We applied both methods, aiming to find converging evidence through these two types of analysis. Only voxels surviving a conservative permutation thresholding with FWE (FWE-corrected level of p < .01) correction were very considered in the results. Furthermore, voxels that were damaged in less than 20% of the patients were excluded from the analysis.
Average Gaze Shift Frequency
For the average frequency of gaze shifts per event category, a three-way repeated-measures ANOVA with Turn transition (transition, no transition) and Event type (overlapping speech, silence) as within-participant factors and Group (aphasic patients, healthy controls) as a between-participant factor revealed a significant main effect of the factor Turn transition (F(1, 37) = 129.741, p < .001, ηp2 = .778) and a significant interaction between factors Turn transition × Group (F(1, 37) = 5.541, p = .024, ηp2 = .130). As expected, the frequency of gaze shifts depended on the factor Turn transition. Participants were more likely to react to turn transitions as compared with events without transition (pauses and within-speaker overlaps; see Figure 2). More interestingly, a post hoc comparison on the average gaze shift frequency at turn transition revealed a statistical trend toward a group difference (p = .072). Aphasic patients tended to show a lower average gaze shift frequency at turn transition than healthy controls. The analysis of the gaze shift frequency per event category demonstrated that turn transitions elicited more gaze shifts than events without transition (p < .001). Hence, the subsequent analyses only focused on the processing of turn transitions.
Gaze Shift Probability at Turn Transitions
A GLMM for the binomial data was modeled for the dependent variable gaze shift reaction (0 = no gaze shift, 1 = gaze shift) including the fixed factors Group (aphasic patients, healthy controls), Type of turn transition (interspeaker gap, interspeaker overlap), Lexico-syntactic complexity, and Intonation variance. Furthermore, participant and video were controlled as random effect terms.
The GLMM revealed significant main effects of Group (z = −1.988, p = .047), Type of turn transition (z = −3,167, p = .002), and Lexico-syntactic complexity (z = −3.529, p < .001). Healthy controls showed a higher probability than aphasic patients to shift their gaze on turn transitions. The probability for a gaze shift reaction was higher for turn transitions with interspeaker gap and for turn transitions that were preceded by an IETI with a higher lexico-syntactic complexity. Moreover, the GLMM revealed significant interactions between factors Group × Lexico-syntactic complexity (z = 2.847, p = .004), Group × Type of turn transition (z = 3.243, p < .001), and Lexico-syntactic complexity × Type of turn transition (z = 3.429, p < .001) and a trend toward an interaction between factors Group × Type of turn transition × Lexico-syntactic complexity (z = −1.889, p = .059). In healthy participants, the probability for a gaze shift increased with increasing values on the lexico-syntactic complexity index, whereas the opposite pattern could be observed in aphasic patients (see Figure 3A). Furthermore, healthy controls showed a higher probability for a gaze shift reaction on interspeaker gaps compared with interspeaker overlaps (p = .049; see Figure 3B). The interaction between factors Lexico-syntactic complexity × Type of turn transition revealed two opposing patterns: For interspeaker overlaps, gaze shift probability decreased with increasing lexico-syntactic complexity; for interspeaker gaps, we observed the reversed pattern. The statistical trend for the three-way interaction between factors Group × Type of turn transition × Lexico-syntactic complexity suggests that aphasic patients reacted less frequently to interspeaker overlaps with higher lexico-syntactic complexity than healthy participants. There was neither a main effect of factor Intonation variance (z = −0.782, p = .434) nor an interaction between factors Group × Intonation variance (z = 0.542, p = .588).
Gaze Shift Latency at Turn Transition
Gaze shifts followed, rather than preceded, the turn transitions in the video (Mpatients/interspeaker gap = 280.82, SEpatients/interspeaker gap = 69.11; Mpatients/interspeaker overlap = 248.21, SEpatients/interspeaker overlap = 79.48; Mcontrols/interspeaker gap = 226.36, SEcontrols/interspeaker gap = 44.24; Mcontrols/interspeaker overlap = 158.81, SEcontrols/interspeaker overlap = 66.63). The repeated-measures ANOVA on the dependent variable Average gaze shift latency did reveal neither significant main effects for Group or Type of turn transition nor an interaction between these two factors. Furthermore, an LMM was modeled on the dependent variable Gaze shift latency at turn transition. The LMM included the same fixed factors and random factors as the GLMM introduced above. We found significant interactions between factors Group × Intonation variance (t = 2.116) and Group × Lexico-syntactic complexity × Intonation variance (t = −2.249). As depicted in Figure 4, healthy participants showed shortest gaze shift latencies if both lexico-syntactic complexity and intonation variance were increased. In contrast, aphasic patients did not show such a clear pattern.
The mean volume of aphasic patients' individual brain lesions was 96.14 cm3 (SD = 17.00 cm3). One patient was excluded from the lesion analysis because he was left-handed. To identify lesion sites associated with increased gaze shift latencies because of processing of lexico-syntactic and intonation variance, we performed two separate VLSM analyses: one with two binary predictors and another with two continuous predictors. For this reason, correlations coefficients were calculated for each participant between the lexico-syntactic complexity and the intonation variance during the IETI, with the gaze shift latency registered at the corresponding turn transition. A positive correlation indicates that higher lexico-syntactic complexity and/or more intonation variance is associated with increased gaze shift latencies. In contrast, a negative correlation indicates that a participant can benefit from additional lexico-syntactic content or more intonation variance, as reflected in shorter gaze shift latencies. For the first VLSM model, correlation coefficients for the variables lexico-syntactic complexity and intonation variance were transformed into separate binary predictors. To binarize the predictor, the 75th percentile was used as cutoff score, because higher positive correlations are related to maladaptive processing (i.e., increased gaze shift latencies). This resulted in the following two values: 0 (correlation coefficient > 75th percentile of the correlation coefficients obtained in healthy participants) and 1 (correlation coefficient < 75th percentile of the correlation coefficients obtained in healthy participants).
The binomial Liebermeister test, calculated for the first VLSM model, revealed a significant lesion cluster (FWE-corrected level of p < .01) for the factor Lexico-syntactic complexity on the posterior branch of the left arcuate fasciculus (Talairach coordinates = −37, −48, 25; as illustrated in Figure 5) but no significant cluster for the factor Intonation variance. This result was confirmed by the second VLSM model, where the individual correlation coefficients were entered as a continuous predictor and t tests (FWE-corrected level of p < .01) were applied to perform comparisons on a voxel-by-voxel basis (Talairach coordinates = −36, −48, 24). Furthermore, we verified the reliability of the VLSM models by running an additional lesion subtraction analysis. Distinct lesion overlap maps were generated for the two patient subgroups defined according to the factor Lexico-syntactic complexity. In line with the VLSM analyses, the group subtraction analysis revealed that patients with increased gaze shift latencies at turn transitions, because of higher lexico-syntactic complexity of the preceding turn, showed an exclusive lesion cluster on the posterior branch of the arcuate fasciculus (Talairach coordinates = −37, −47, 23; Figure 6C).
This study aimed at gaining new insights into real-time processing of ongoing turns in aphasic patients by analyzing the frequency and timing of turn transition-related gaze shifts during video observation of naturalistic conversations. The main finding is that aphasic patients showed a lower probability to shift their gaze at turn transitions than healthy participants. The probability whether a gaze shift would occur depended on the lexico-syntactic complexity of the video content preceding a particular turn transition. In healthy controls, higher lexico-syntactic complexity led to higher gaze shift probabilities. The opposite, that is, decreasing gaze shift probability associated with higher lexico-syntactic complexity, was found in aphasic patients. The timing of gaze shifts depended on both the lexico-syntactic complexity and the intonation variance provided before turn transitions. Healthy controls, but not aphasic patients, gained from intonation variance when lexico-syntactic complexity was increased. Furthermore, we found that brain lesions to the posterior branch of the left arcuate fasciculus predicted the impact of lexico-syntactic complexity on gaze shift latency in aphasic patients.
Our results indicate that turn transitions trigger more gaze shifts in both groups than pauses and within-speaker overlaps. This implies that aphasic patients did not show unsystematic visual exploration behavior during video observation and that they were per se able to reliably detect turn transitions. However, aphasic patients showed a lower probability to react to turn transitions than healthy controls. This observation is supported by converging evidence from the repeated-measures ANOVA conducted on average gaze shift frequency over all event categories and from the GLMM including the binomial data from individual turn transitions in the video material.
Overall, gaze shift probability at turn transitions depended on the type of turn transition and on the complexity of the lexico-syntactic information provided before the transition itself, but not on the intonation variance in the same time window. This finding fits well with evidence from previous research, which suggested that the ability to detect upcoming turn transitions mainly relies on the availability of lexico-syntactic information (De Ruiter et al., 2006). Our results imply that additional lexico-syntactic information may help healthy participants to detect upcoming turn transitions more accurately. Gaze shift probability in healthy participants was higher for transitions with interspeaker gaps. This supports our hypothesis that this type of turn transition is more reliably detected. In the case of an interspeaker gap, it is probably easier to resolve who is taking the next turn than for interspeaker overlaps (Schegloff, 2000). Moreover, healthy participants shifted their gaze more frequently on turn transitions that were preceded by segments with higher lexico-syntactic complexity. As expected, lexico-syntactic complexity had an opposite effect on gaze shift probability in aphasic patients suggesting that aphasic patients have difficulties to integrate this parameter when initiating their gaze shift. Previous research clearly indicates impairments in lexico-syntactic processing in aphasic patients (Caplan et al., 2007; Jefferies & Ralph, 2006; Caramazza & Berndt, 1978). Moreover, in a recent eye-tracking study from our group, we found reduced understanding of syntactically complex sentences in aphasic patients because of impaired recognition and integration of morphosyntactic cues (Schumacher et al., 2015). Therefore, the video segments with higher lexico-syntactic complexity might be very demanding for aphasic patients and were thus accompanied by a reduced gaze shift probability.
In line with the results reported by Hirvenkari et al. (2013), we found that the majority of gaze shifts occurred after turn transitions. The average gaze shift latency indicates that some gaze shifts were planned before the completion of the turn. Contrary to our expectation, aphasic patients did not show an increased gaze shift latency. However, we found an interaction between group membership (aphasic patients, healthy controls), lexico-syntactic complexity, and intonation variance. Healthy controls were most likely to project the end of a speaker turn when lexico-syntactic complexity and intonation variance were increased (Figure 4). In a recent study, Keitel and Daum (2015) also found an effect of intonation on gaze shift latencies in healthy participants. According to our results, aphasic patients did not benefit from the interaction between intonation variance and lexico-syntactic complexity. Earlier studies found that the processing of linguistic prosody mainly relies on the right hemisphere (Brådvik et al., 1991; Weintraub, Mesulam, & Kramer, 1981), which would suggest that the recognition of linguistic prosody should not be affected in aphasic patients with left-hemispheric brain lesions. This finding has later been supported by neuroimaging studies in healthy participants, which reported right-hemispheric specialization when linguistic prosody was compared with other speech processes (Kyong et al., 2014; Strelnikov, Vorobyev, Chernigovskaya, & Medvedev, 2006; Meyer, Alter, Friederici, Lohmann, & von Cramon, 2002). In contrast, studies that directly compared linguistic prosody with emotional prosody found a primary involvement of the left brain hemisphere (Wildgruber et al., 2004; Pell & Baum, 1997). Shapiro and Nagel (1995) suggested that aphasic patients with deficits in lexico-syntactic processing may not benefit from additional prosodic information when parsing sentential units because they cannot concurrently process syntactic and prosodic information. This notion is supported by a recent imaging study in healthy participants, which suggests that the recognition of linguistic prosody depends on the activity in a bilateral network (Kreitewolf, Friederici, & von Kriegstein, 2014).
VLSM revealed that the modulation of the gaze shift latencies by the lexico-syntactic complexity was predicted by a lesion cluster located on the posterior end of the arcuate fasciculus, between the inferior parietal lobe and the superior temporal lobe. This area is part of the left-hemispheric language network, lying in close vicinity of the left posterior superior temporal gyrus, the left angular gyrus, and the TPJ. Functional imaging studies in healthy participants showed that both the left superior temporal gyrus and the left angular gyrus are involved in syntactic and semantic processing (Buchsbaum & D'Esposito, 2008; Graves, Grabowski, Mehta, & Gupta, 2008; Humphries, Binder, Medler, & Liebenthal, 2006; Buchsbaum, Hickok, & Humphries, 2001; Keller, Carpenter, & Just, 2001).
Several studies consistently reported that the left posterior superior temporal cortex is activated during written sentence comprehension (Cooke et al., 2002; Just, Carpenter, Keller, Eddy, & Thulborn, 1996) and auditory sentence comprehension (Buchsbaum et al., 2001). Moreover, activity in the left superior temporal gyrus seems to be modulated by syntactic complexity (Newman, Ikuta, & Burns, 2010; Friederici, Makuuchi, & Bahlmann, 2009; Kinno, Kawamura, Shioda, & Sakai, 2008; Cooke et al., 2002; Just et al., 1996) and word frequency (Graves et al., 2008). However, this area seems to be involved not only in language processing but also in audiovisual integration (Stevenson & James, 2009; Beauchamp, Lee, Argall, & Martin, 2004; Calvert, Campbell, & Brammer, 2000) and face processing (Haxby, Petit, Ungerleider, & Courtney, 2000). Taken together, the left superior temporal cortex is clearly involved in integrating different types of information during language processing. Friederici (2011) suggested in a review article that the left posterior superior temporal cortex, together with the STS and the BG, might be involved in the integration of semantic and syntactic information.
The left angular gyrus also seems to support both sentence-level semantic and syntactic processing. Bavelier et al. (1997) showed increased activation of the left angular gyrus in response to additional syntactic information, that is, when comparing sentence reading with word list reading. Humphries et al. (2006) found greater activity in the left angular gyrus for semantically congruent sentences compared with sentences containing random words or pseudowords. Humphries and colleagues (2006) further suggested that the left angular gyrus might be more strongly engaged in semantic processes than in syntactic ones, because it requires lexical information to be activated, but not necessarily syntactic information. Interestingly, Keller et al. (2001) found that angular gyrus activation interacts with lexical word frequency, showing stronger activation for more complex sentences that include low-frequency words than less complex sentences.
The left TPJ, also known as left Sylvian-parietal-temporal area, is thought to function as a sensorimotor interface between the phonological networks in the bilateral superior temporal gyrus and the articulatory networks in the anterior language system (Hickok & Poeppel, 2007). This area has also been shown to be crucial for auditory verbal working memory (Buchsbaum & D'Esposito, 2008; Shallice & Warrington, 1977). Furthermore, the left TPJ, which is located laterally with respect to the lesion cluster identified in this study, also belongs to a widely distributed neural network underlying Theory of Mind mechanisms. Ciaramidaro et al. (2007) investigated the contribution of different nodes within this network in an fMRI study showing that the left TPJ was selectively activated when participants had to anticipate the endings of stories that included social and especially communicative intentions.
In this study, we investigated the detection of turn transitions and the timing of transition-related gaze shifts in aphasic patients. We found that the detection of turn transitions in healthy participants depends on the lexico-syntactic information provided before the transition itself. Moreover, healthy controls were more likely to project the end of turns when higher lexico-syntactic complexity was associated with a greater amount of prosodic information. We showed that gaze shift probability in aphasic patients was reduced at transitions that were preceded by more complex turns with higher lexico-syntactic processing demands.
This study was supported by the Swiss National Science Foundation (grant no. 320030_138532/1). We also thank Sandra Perny, Susanne Zürrer, Julia Renggli, Marianne Tschirren, Corina Wyss, Carmen Schmid, Gabriella Steiner, Monica Koenig-Bruhin, Nicole Williams, Reto Hänni, Gianni Pauciello, Silvia Burren, Andreas Löffel, Michael Schraner, Nina Kohler, Anita Mani-Luginbühl, Hans Witschi, Sarah Schaefer, Martin Zürcher, and Michael Rath for their assistance.
Reprint requests should be sent to Prof. Dr. med. René Müri, Perception and Eye Movement Laboratory, Departments of Neurology and Clinical Research, Inselspital, University Hospital Bern, Freiburgstrasse 10, 3010 Bern, Switzerland, or via e-mail: Rene.email@example.com.