1 New Debates on Co-speech Gesture Semantics
Following initial formal work on the semantic integration of gestures in discourse (e.g., Lascarides and Stone 2009) and on iconic aspects of gestural semantics (Giorgolo 2010), there has recently been a resurgence of interest in the formal and experimental semantics of co-speech gestures. It was motivated on three fronts: co-speech gestures have become crucial to understanding whether spoken language has means of iconic enrichment similar to those of sign language (Goldin-Meadow and Brentari 2017); co-speech gestures have become the topic of new debates in theoretical semantics, pertaining to their proper place in the inferential typology (Ebert and Ebert 2014, Schlenker 2018a); and experiments have been conducted to try to adjudicate these new debates (e.g., Tieu et al. 2017, 2018).
First, co-speech gestures became crucial to conducting a proper comparison between the semantics of signed and spoken languages. Iconic modulations (i.e., modifications of a lexical sign to represent aspects of the denoted object or events) are a notoriously fertile means of semantic enrichment in sign language, but they are more impoverished in spoken language. For instance, in ASL (American Sign Language) the movement of the hands realizing the verb GROW can be made faster or broader to denote a growth process that is quicker or of greater amplitude (Schlenker, Lamberton, and Santoro 2013). In English, the vowel of the adjective long can be modulated to refer to a very long process (Okrent 2002), as in The talk was looong (see also Fuchs et al. 2018). But it is clear that intrinsic limits of vocal iconicity make this a relatively circumscribed process. By contrast, co-speech gestures afford a fertile means of iconic enrichment of speech, one that is prima facie comparable to iconicity in sign. This motivated Goldin-Meadow and Brentari’s (2017) intimation that sign with iconicity should be compared to speech with co-speech gestures rather than to speech alone.
The question, however, is whether iconic modulations in sign are genuinely similar to iconic enrichments of speech by way of co-speech gestures. With respect to the iconic information that is conveyed, the similarity might be real. But on another front, results from the recent literature on signs and gestures highlight an important difference: in Schlenker 2018b, I argue that iconic modulations (i.e., the modification of a sign or word rather than its enrichment by an external addition) can often make an at-issue contribution. Thus, the sentence If the talk is loooong, I’ll leave before the end can be understood to make the same kind of (at-issue) claim as If the talk is very long, I’ll leave before the end. Similar observations were made about iconic modulations of GROW in ASL. By contrast, researchers have suggested in different ways that co-speech gestures do not make at-issue contributions, at least not in the absence of a (potentially costly) process of adjustment.
Two types of non-at-issue content have played a prominent role in recent debates on co-speech gestures. Presuppositions are characterized by their projective behavior: they are inherited by complex sentences in ways that distinguish them from at-issue entailments as well as from other pragmatic inferences. Thus, x regrets q-ing presupposes that x q-ed; the presupposition is inherited by yes-no questions as in (1a) and gives rise to a universal inference under none-type quantifiers as in (1b) (see Chemla 2009 for experimental results).
Does Ann regret helping her daughter?
⇒ Ann helped her daughter
None of these 10 women regrets helping her daughter.
⇒ each of these 10 women helped her daughter
Supplements are the semantic contributions of appositive relative clauses; they are thought not to interact scopally with operators, and thus to always have “wide scope” behavior (Potts 2005), as illustrated in (2) (I will refine this point shortly).
Did Ann help Robin, who was taking an exam?
⇒ Robin was taking an exam
None of these 10 women helped Robin, who was taking an exam.
⇒ Robin was taking an exam
In pioneering work, Ebert and Ebert (2014) suggest that co-speech gestures contribute supplements. In Schlenker 2015, 2018a,b, I argue instead that co-speech gestures contribute presuppositions of a special sort, called cosuppositions. To illustrate, both lines argue that in (3a) the contribution of the co-speech gesture is not at-issue, whereas the modifier in (3b) does make an at-issue contribution.1 (Notation: The gesture cooccurs with the expression that immediately follows the picture (or the capitalized transcription LIFT) and is connected to it by _ .)
⇒ if John helped his son, he did so by lifting him
Did John help his son like
⇏ if John helped his son, he did so by lifting him
Importantly, on the “supplemental” view and the cosuppositional view alike, co-speech gestures do not make an at-issue contribution, and for this reason sign with iconic modulations does not in general yield the same results as speech with co-speech gestures.
The next question is to determine which of these two analyses of co-speech gestures, if either, is correct. For Ebert and Ebert (2014), (3a) has a meaning akin to that of Did John help his son, which involved/would have involved lifting him? A supplement (realized in this example, but not in the co-speech case, by an appositive relative clause) modifies the meaning of the VP and for this reason “projects” out of the scope of the question operator. In contrast, in Schlenker 2015, 2018a,b I argue that in (3a) the lifting co-speech gesture (which I will write as LIFT) cooccurring with the verb triggers a presupposition of the form if x helped, x lifted. This presupposition is called a “cosupposition” because it is conditionalized on the at-issue contribution of the modified expression.
An important part of the empirical debate pertains to examples such as those in (4). Proponents of the cosuppositional view argue that in this case we obtain universal inferences that are reminiscent of presupposition projection under none-type quantifiers, as illustrated in (1b). This environment is particularly important because, in some cases at least, supplements are degraded in the scope of negative operators (Potts 2005), as is illustrated in (5)–(6).
None of these 10 guys
⇒ for each of these 10 guys, if he had helped his son, this would have involved some lifting
None of these 10 guys helped his son like
⇏ for each of these 10 guys, if he had helped his son, this would have involved some lifting
One/None of these 10 guys LIFT_helped his son.
One/#None of these 10 guys helped his son, which (by the way) he did by lifting him.
It’s likely/It’s unlikely that John LIFT_helped his son.
It’s likely/#It’s unlikely that John helped his son, which (by the way) he did by lifting him.
Experimental means have also been brought to bear on this debate by Tieu et al. (2017) (truth-value judgments and picture selection task) and Tieu et al. (2018) (inferential judgments). They broadly confirmed the inferential predictions of the cosuppositional view: co-speech gestures interact with logical operators in approximately the same way as presupposition triggers do. While there were slightly different results in the two experiments,2 both suggested that co-speech gestures are “weak” triggers: the presuppositions they trigger can easily be turned into the at-issue component (in technical parlance, they are easy to “locally accommodate”). This is not at all a final refutation of the supplemental view, however, and the debate is still ongoing.
2 Why Cosuppositions?
2.1 Initial Justifications
While the cosuppositional view has garnered some experimental support, it leaves an important explanatory question open: if indeed a co-speech gesture as in (3a) and (4a) triggers a presupposition, why is it conditionalized on the content of the VP? In Schlenker 2018a,b, I consider two options, but neither is entirely satisfactory.
One possibility is that the co-speech gesture contributes a presupposed conjunct processed after the expression it cooccurs with (let us call this Theory I). Simplifying somewhat, help his son with the co-speech gesture would thus contribute a representation of the form Help &Lift, where Help has the meaning help his son, and the underlined Lift contributes a presupposition akin to lift his son. The fact that Lift comes after Help ensures, in standard theories of presupposition projection (e.g., Heim 1983), that the final presupposition is conditionalized on Help, as the presupposition of the second conjunct can be satisfied by content that appears in the first conjunct. But on this view, it is a mystery why a gesture that cooccurs with the verb is processed as it if were postposed. One could attempt to argue that a gesture cannot be linearized after the expression it modifies, but this is just not true: as argued in Schlenker 2018b, “post-speech gestures” are acceptable in a variety of environments, but have a different semantic signature.3
An alternative possibility (Theory II) is that the mode of composition of co-speech gestures guarantees that (relative to its local context) the value of the VP should entail the content of the gesture—but it is unclear where this requirement comes from.
2.2 An Alternative within the Transparency Theory
I will propose instead that the conditionalized nature of cosuppositions might be made to follow from an extension of the Transparency theory (Schlenker 2008), which I will now briefly introduce.
In Schlenker 2008, I argue that the presupposition d of a (predicative/propositional) trigger dd′ is a normal entailment that “wants” to be articulated as a separate conjunct, as stated in (7), where I continue to underline presuppositions.
In any syntactic environment, express . . . dd′ . . . as . . .
(d anddd′) . . . (unless independent pragmatic principles rule out the full conjunction).
To illustrate, one should “be articulate” and say . . . it’s raining and John knows it . . . (with two conjuncts) rather than . . . John knows that it’s raining . . . (without a conjunction). Be Articulate is controlled by a Gricean principle of manner, Be Brief, which prohibits unnecessary prolixity and takes precedence over Be Articulate—thus ruling out If it is raining, it is raining and John knows it. In its incremental version (which takes as given linguistic information that figures before the relevant expression, but not linguistic information that follows it), Be Brief prohibits one from saying . . . [d and _ ] . . . in case no matter what the second conjunct is, no matter what the end of the sentence turns out to be, the same semantic result could be obtained by replacing [d and _ ] with _ (which means that d is redundant). This is stated formally in (8) (here and throughout this section, I discuss a syntax akin to English; I discuss crosslinguistic refinements in section 3).
Be Brief – Incremental version
Given a context set C, a predicative/propositional occurrence of d is infelicitous in a sentence that begins with a (d and if for any expression g of the same type as d and for any sentence completion b′,
C ╞ a (d and g) b′ ⇔ a g b′.
In our example, If it is raining, [it is raining and _ ] violates the incremental version of Be Brief, because no matter what the second conjunct is, the first conjunct can be eliminated without informational loss (this holds even if the sentence as a whole is nontrivial, as in If it is raining, it is raining and it is cold).
Putting (7) and (8) together, dd′ is acceptable in a sentence of the form add′ b just in case the attempt to be “articulate” satisfies the equivalence in (8), thus violating Be Brief. In Schlenker 2007, I prove that this Transparency theory derives the results of Heim 1983 for a fragment with generalized quantifiers, modulo technical assumptions.
(7)–(8) are tailored to the case of “articulated” competitors of the form . . . (d anddd′) . . . . But I propose that a further option (not explored in Schlenker 2008) explains the conditionalized nature of cosuppositions. As already suggested by (3b) and (4b), the content of a co-speech gesture G modifying d′ in . . . G_d′ . . . is naturally “articulated” as . . . d′ g . . . , where g is a postverbal modifier with the same content as G. For instance, John LIFT_helped his son can naturally be turned into an articulated expression: namely, John helped his son like LIFT_this. If d′ g is conjunctively interpreted, dynamic semantics predicts that g, the postposed modifier, is redundant in its local context (and thus violates Be Brief) just in case the local context c′ of d′ guarantees that the update of c′ with d′ entails g—that is, c′ ╞ d′⇒g. This derives the conditionalized presupposition we observe. (While like this is a particularly simple means of “articulation,” all that matters to derive the desired result is that the at-issue gestural content should come right after the VP d′.)
Within the Transparency theory, the postposed nature of the modifier explains why the gestural presupposition is conditional, modulo the extension of (7)–(8) sketched in (9a–b). (9b) rules out the articulated competitor . . . helped his son like LIFT_this . . . just in case no matter which further modifier is added, no matter how the sentence ends, the like this–phrase can be eliminated without affecting the truth conditions. This means that the postverbal modifier must be redundant after the verbal meaning has been computed.
(9) Consider a sentence a G_d′ b, where G is a gesture with content g, cooccurring with a (modifier-compatible) expression d′.
Modified Be Articulate
Say a (d′ g) b rather than a G_d′ b, unless this violates (b).
Modified Be Brief – Incremental version
Given a context set C, do not say a (d′ g) b if g is incrementally redundant (= trivial), in the sense that for any modifier c′, for any sentence completion b′,
C ╞ a ((d′ g) c′) b′ ⇔ a (d′ c′) b′.
Assuming that the modifiers are intersective, (9b) is equivalent to the acceptability conditions predicted by (7)–(8) for a (d′ andgd*) b, where d* may be any at-issue component, as illustrated in (10). In particular, these are the very conditions that are predicted for (d′ andg), where g is a purely presuppositional conjunct appearing after d′. This is precisely what Theory I above needed to stipulate, but now the result is derived from the fact that the articulated competitor has a modifier that comes after the modified expression.
3 Refinements and Extensions
3.1 Beyond Linear Order
While I have followed Schlenker 2007, 2008 in framing the discussion in terms of linear order, this is now known to be incorrect in the general case, although the “right” notion of order is still under debate (Chierchia 2009, Ingason 2016, Mayr and Romoli 2016, Mandelkern and Romoli 2017). For our purposes, the predicted generalization should be that gestural cosuppositions are triggered whenever the full modifier that “articulates” the gestural content is processed after the modified words. Whether a modifier is processed “after” an expression (with a notion of order that need not be linear) can be assessed on the basis of intuitive redundancy effects that yield violations of Be Brief, and such data sometimes argue against linear analyses.
3.1.1 Ingason (2016) on Japanese
As an example, in order to account for redundancy effects in Japanese, Ingason (2016) argues that in some cases a notion of hierarchical order is called for. The contrast in the English examples in (11) could be explained by either a linear or a hierarchical order, since the noun is both hierarchically higher and to the left of the modifier. But things are different in the Japanese examples in (12), as they suggest that the hierarchical rather than the linear order is the right one for computing redundancy effects: yamome ‘widow’ is strictly more informative than zyosei ‘woman’, yet zyosei can appear after yamome in (12a), presumably because it is in a structurally higher position.
John met [a woman [who is a widow]].
#John met [a widow [who is a woman]].
Taro-ga [[yamome-dearu] zyosei-ni] atta.
Taro-NOM [[widow-COP] woman-DAT] met
‘Taro met a woman who is a widow.’
#Taro-ga [[zyosei-dearu] yamome-ni] atta.
Taro-NOM [[woman-COP] widow-DAT] met
‘Taro met a widow who is a woman.’
In sum, the Japanese data argue that Be Brief should, in some cases at least, be stated in structural rather than linear terms. Since presuppositional inferences predicted by the Transparency theory derive from redundancy effects (through Be Brief), application of the theory to further constructions and languages should be conducted in tandem with an assessment of redundancy effects, with the possibility that the linear account introduced in section 2.2 ought to be revised.
While we do not have data on gestural cosuppositions in Japanese, an anonymous reviewer notes that German also presents a problem for a purely linear analysis. I start from the contrast in (13), modified from one proposed by the reviewer; it repeats for German the contrast in (3), but with the difference that the adverbials that appear in the at-issue control in (13b) appear before rather than after the verb.4
Wird Hans seinem Sohn LIFT_helfen?
will Hans his son LIFT_help
‘Will Hans LIFT_help his son?’
⇒ if Hans helps his son, lifting will be involved
Wird Hans seinem Sohn LIFT_so / durch Heben
will Hans his son LIFT_this_way / by lifting helfen?
‘Will Hans help his son like LIFT_this / by lifting him?’
⇏ if Hans helps his son, lifting will be involved
The present analysis leads one to predict that, for purposes of redundancy computation, the VP in (13b) should be processed before the PP modifier, despite the fact that the PP linearly precedes the VP. This prediction appears to be borne out, as the contrasts in (14) and suggest.
Merkel hat mit klaren / Macrons / ihren eigenen
Merkel has with clear / Macron’s / her own Worten gesprochen.
‘Merkel spoke with clear / Macron’s / her own words.’
#Merkel hat mit Worten gesprochen.
Merkel has with words spoken
Hans hat sich mit einer komischen Bewegung
Hans has himself with a funny movement bewegt.
‘Hans moved with a funny movement.’
#Hans hat sich mit einer Bewegung bewegt.
Hans has himself with a movement moved
We can reason as follows.
If for purposes of redundancy computation the PPs were computed before the verbs, the contrasts would not be derived. Specifically, (14a) and (14b) should have the same status and should both be acceptable, since the final verb does contribute information that does not follow from the content of the PPs—for instance, gesprochen ‘spoken’ could be replaced with geantwortet ‘answered’, which need not entail spoken, as an answer could be written. Thus, the choice between the two verbs is informative even after the PP has been processed. Similarly, (15a) and (15b) should have the same status as each other, and here both should presumably be deviant, since the final verb bewegt ‘moved’ does not contribute information that doesn’t already follow from the PP.
By contrast, things fall into place if for purposes of triviality computation the verbs are processed before the PPs. In (14a) and (15a), the PPs add information to the verb, but this is not the case in (14b) and (15b): for Merkel to speak entails that she does so in words, and similarly if Hans moves, he certainly does so with movement.
A further and largely orthogonal question is how these facts bear on recent theories of linear or hierarchical biases in the computation of presupposition projection and local redundancy (Chierchia 2009, Ingason 2016, Mayr and Romoli 2016, Mandelkern and Romoli 2017, Chung to appear). It is an important but difficult question, as it interacts with the complex issue of the syntactic analysis of adverbials in German (e.g., Frey and Pittner 1998); I thus leave it for future research, while keeping the result that in the case at hand local redundancy patterns as is expected in view of the observed cosuppositions triggered by co-speech gestures.
3.2 Cosuppositions Triggered by Words?
One final question could be raised: could there be spoken expressions that trigger cosuppositions because their “articulated” competitor involves a modifier? For instance, one may ask (following suggestions by Chris Kennedy and Anna Szabolcsi (pers. comm.)) whether this analysis extends to verbs that encode manner modifications, as in (16a), which might compete with (16b) (the focus on none is intended to avoid focus on the modifier, which might suffice to trigger “givenness” inferences that look like presuppositions, as discussed in Abrusan 2013).
NoneF of these 10 guys drove / swam to the bridge.
NoneF of these 10 guys got to the bridge by driving / got to the bridge by swimming.
Extending Be Articulate to (16a) would predict an inference that for each of these 10 guys, if he had gotten to the bridge, he would have done so by driving / swimming. As things stand, the data do not seem sufficiently clear to me, and they would require further investigation.
In sum, I have argued that the conditionalized nature of gestural cosuppositions might follow from the Transparency theory of presupposition projection, combined with the assumption that the natural “articulated” alternative to a VP with a co-speech gesture involves an explicit PP modifier (e.g., like this). In English, this modifier comes after the VP. As a result, to make it trivial (and thus rule out the articulated competitor, leaving the co-speech gesture as the “winner”), the VP must be presupposed to entail the content of the gesture—hence the conditionalization. But the linear version of the Transparency theory makes incorrect predictions when the PP modifier comes before the verb, as in German. Still, I have provided independent evidence (based on redundancy effects) that this is not due to the proposed extension of the Transparency theory to co-speech gestures and PP modifiers; rather, it is due to the fact that, in these cases at least, processing order is not linear but structural. Finally, I have speculated that cosuppositions might exist in other domains in which an expression competes with an articulated alternative that involves a PP modifier.
From a broader perspective, the non-at-issue nature of co-speech gestures suggests that these make different contributions from iconic modulations, which can be at-issue. While the iconic content of these modulations might be better expressed with gestures than with words, co-speech gestures interact in a different way with logical operators due to their cosuppositional status.
1 It does not matter for theoretical purposes whether the at-issue control involves like this or by lifting him. The former has the advantage of offering a minimal pair with the co-verbal gesture in (3a), but there is nothing particularly mysterious about the workings of the gesture in (3b): this is a demonstrative element that refers to a contextual element which, in the case at hand, happens to be a demonstration realized by a gesture.
3 Specifically, in Schlenker 2018a,b I argue that they have the distribution of appositive relative clauses rather than of presupposition triggers. Thus, (5) and (6) can be complemented with (i) and (ii), in which – LIFT encodes a gesture that follows the end of the sentence and behaves like the appositives in (5b) and (6b).
One/#None of these 10 guys helped his son – LIFT.
It’s likely/#It’s unlikely that John helped his son – LIFT.
In fact, for the proposals in Schlenker 2015, 2018a,b this difference between co- and post-speech gestures serves as an argument against the supplemental theory of co-speech gestures, which has initial difficulties accounting for this contrast.
4 All acceptability and inferential judgments were confirmed by one native speaker from Germany and one native speaker from Austria. Most acceptability judgments were also discussed with one further native speaker from Germany and one further native speaker from Austria (the latter also confirmed the inferential judgments).
For helpful theoretical or empirical discussions, I wish to thank Dylan Bumford, Emmanuel Chemla, Chris Kennedy, Nathan Klinedinst, Jeremy Kuhn, Rob Pasternak, Anna Szabolcsi, Lyn Tieu, and the participants of my NYU seminar in Fall 2015. Many thanks to Cornelia Ebert, Manuel Križ, Clemens Mayr, and Ulrich Miksch for discussion of German data. This squib greatly benefited from the remarks of anonymous reviewers for Linguistic Inquiry and of the editors of the Squibs and Discussion section. I am particularly grateful to an anonymous reviewer who suggested that I discuss German in the context of this squib.
Author’s affiliations: Institut Jean-Nicod (ENS - EHESS - CNRS), Département d’Etudes Cognitives, Ecole Normale Supérieure, Paris, France; PSL Research University; New York University, New York.
Grant acknowledgments: The research leading to these results received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007–2013) / ERC Grant Agreement N°324115–FRONTSEM (PI: Schlenker). Research was conducted at Institut d’Etudes Cognitives, Ecole Normale Supérieure - PSL Research University. Institut d’Etudes Cognitives is supported by grant ANR-17-EURE-0017.