Formal probabilistic models, such as the Rational Speech Act model, are widely used for formalizing the reasoning involved in various pragmatic phenomena, and when a model achieves good fit to experimental data, that is interpreted as evidence that the model successfully captures some of the underlying processes. Yet how can we be sure that participants’ performance on the task is the result of successful reasoning and not of some feature of experimental setup? In this study, we carefully manipulate the properties of the stimuli that have been used in several pragmatics studies and elicit participants’ reasoning strategies. We show that certain biases in experimental design inflate participants’ performance on the task. We then repeat the experiment with a new version of stimuli which is less susceptible to the identified biases, obtaining a somewhat smaller effect size and more reliable estimates of individual-level performance.

This content is only available as a PDF.

Author notes

Competing Interests: The authors declare no conflict of interest.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.