Polite Speech Emerges From Competing Social Goals

Language is a remarkably efficient tool for transmitting information. Yet human speakers make statements that are inefficient, imprecise, or even contrary to their own beliefs, all in the service of being polite. What rational machinery underlies polite language use? Here, we show that polite speech emerges from the competition of three communicative goals: to convey information, to be kind, and to present oneself in a good light. We formalize this goal tradeoff using a probabilistic model of utterance production, which predicts human utterance choices in socially sensitive situations with high quantitative accuracy, and we show that our full model is superior to its variants with subsets of the three goals. This utility-theoretic approach to speech acts takes a step toward explaining the richness and subtlety of social language use.

probability that the utterance is true of state ; this kind of lexicon is a generalization of the more traditional, binary truth functional semantics (see Degen, Hawkins, Graf, Kreiss, and Goodman (2020) for a discussion of this kind of "soft semantics"). is estimated empirically from the Literal Semantics task described in the next section, where it is roughly the proportion of participants who endorse the utterance in state in the Literal Semantics task.
The first-order speaker 1 (Eq. ) chooses utterances approximately optimally given a utility function, which can be decomposed into two components: informational and social utility.

First, informational utility (
) is the amount of information a literal listener 0 would still not know about world state after hearing a speaker's utterance , and is given by the log probability of the world state given the utterance ln ( | ). Second, social utility ( ) is the expected subjective utility of the state inferred given the utterance : ( ) denotes the subjective utility function that maps states of the world onto subjective values. For this paper, we assume that is the identity function that returns the number of hearts which defines a state. The overall utility of an utterance subtracts the cost ( ) from the weighted combination of the social and informational utilities, and the speaker then chooses utterances softmax-optimally given the state and his goal weight mixture .
Equations and are described in the main text. Eq. is a model of a pragmatic listener who jointly reasons about the state of the world and the first-order speaker's utility weighting (social vs. informational utility). Again, we assume an uninformative prior over states as well as an uninformed prior over the speakear's utility weights. Eq. is a model of a second-order pragmatic speaker who produces utterances approximately optimally given a three-component utility function: informational, social, and presentational utilities. These utilities are defined with respect to the pragmatic listener 1 and are computed by marginalizing 1 's joint distribution over states and utility-weights: 1 ( | ) = ∫ 1 ( , | ) and 1 ( | ) = ∫ 1 ( , | ) . The definitions of social and informational utility mirror those of the firstorder speaker, only they are defined with respect to the 1 distribution as opposed to the 0 distribution. The self-presentational utility is the negative surprisal of the goal-weight parameter that 1 reasons about: ln 1 ( | ). These three utilities are then weighed by a set of three mixture components , which are inferred from the data separately for each experimental condition (see main text for details).

Literal semantics task
We probed judgments of literal meanings of the target words assumed by our model and used in our main experiment.
51 participants with IP addresses in the United States were recruited on Amazon's Mechanical Turk.
We used thirteen different context items in which a speaker evaluated a performance of some kind. For example, in one of the contexts, Ann saw a presentation, and Ann's feelings toward the presentation (true state) were shown on a scale from zero to three hearts (e.g., two out of three hearts filled in red color; see Figure ?? for an example of the heart scale). The question of interest was "Do you think Ann thought the presentation was / wasn't X?" and participants responded by choosing either "no" or "yes." The target could be one of four possible words: terrible, bad, good, and amazing, giving rise to eight different possible utterances (with negation or no negation). Each participant read 32 scenarios, depicting every possible combination of states and utterances. The order of context items was randomized, and there were a maximum of four repeats of each context item per participant.
We analyzed the data by collapsing across context items. For each utterance-state pair, we computed the posterior distribution over the semantic weight (i.e., how consistent X utterance is with Y state) assuming a uniform prior over the weight (i.e., a standard Beta-Binomial model). Meanings of the words as judged by participants were as one would expect ( Figure 3). Importantly, the task does not elicit alternative-based pragmatic reasoning that would result in pragmatically-enriched meanings (e.g., "good" is interpreted to mean "not amazing"; instead "good" is judged equally true at 2 and 3 heart states) represented on a scale of hearts. Error bars represent 95% confidence intervals.  Table 3. Other than speaker goal mixture weights explained in the main text (shown in Table 1), the full model has two global parameters: the speakers' (both 1 and 2 ) soft-max parameter , which we assume to be the same value, and the utterance cost parameter . We operationalize utterance cost as a penalty on the number of words in an utterance; since utterances are only one or two words long, and two word-long utterances are those involving the negation particle "not", this cost parameter can also be thought of as a cost of producing a negation particle. We use minimally assumptive priors that are consistent with those used for similar models in this model class: ∼ Uniform(0,20), ∼ Uniform(1,10). Cost is assumed to be greater than 1; values less than 1 would imply that a two word-long utterance is cheaper than a one-word long utterance (or, that there is a cost to not producing negation). Finally, we incorporate the literal semantics data into the RSA model by maintaining uncertainty about the semantic weight of utterance for state , for each of the states and utterances, and assuming a Beta-Binomial linking function between these weights and the literal semantics data (see  Table 4.

Model fitting and inferred parameters
We observe that the posterior distributions of the inferred parameters governing the speaker goal mixture weights are unstable across different MCMC chains. To confirm these observations, we ran 3 additional MCMC chains for 350,000 iterations. The resulting posterior distributions for the utility-weight parameters as a function of the goal condition (a "goalcentric" view) are shown in Figure 4. As can be seen by comparing across the rows of Figure 4, both the values and the relative orderings of the utility-weight parameters vary as a function of the chain. For instance, in one run of the model, the informative goal condition has as its strongest utility weight the presentational utility (chain 1); in another, the strongest utility weight is the informational utility (chain 2); in yet another, the informational and presentational weights are approximately equal in strength (chain 3). At the same time, for the informative goal condition, the social utility weight is always close to 0, in all runs of the model. Indeed, the social utility weight appears most consistent across goal conditions and chains. Further, when we examine the utility weights as a function of goal conditions (a "utility-centric" view), we find other signatures of consistency ( Figure 5). The relative ordering of goals is consistent across different MCMC chains; for example, the social-utility weight is highest for the social goal condition, lowest for the informational goal, and in the middle for the both-goal condition. In addition, the projected social-utility weight is inferred to be more on the informational-side (higher value) for the informational goal than for the social or both goal conditions. These patterns suggest that a lower-dimensional parameterization of the model may be available, though the posterior predictive fits and model comparison presented in the main text suggest that the model's parameterization has the appropriate flexibility necessary to account for our experimental data.

Data Availability
Our model, preregistration of hypotheses, procedure, data, and analyses are available at . Figure 6 Experimental results (solid lines) and fitted predictions from the full model (dashed lines) for speaker production. Proportion of utterances chosen (utterance type -direct vs. indirect -in different colors and words shown on x-axis) given the true states (columns) and speaker goals (rows). Error bars represent 95% confidence intervals for the data and 95% highest density intervals for the model. Black dotted line represents the chance level.