Recent work in natural language generation has begun to take linguistic variation into account, developing algorithms that are capable of modifying the system's linguistic style based either on the user's linguistic style or other factors, such as personality or politeness. While stylistic control has traditionally relied on handcrafted rules, statistical methods are likely to be needed for generation systems to scale to the production of the large range of variation observed in human dialogues. Previous work on statistical natural language generation (SNLG) has shown that the grammaticality and naturalness of generated utterances can be optimized from data; however these data-driven methods have not been shown to produce stylistic variation that is perceived by humans in the way that the system intended. This paper describes Personage, a highly parameterizable language generator whose parameters are based on psychological findings about the linguistic reflexes of personality. We present a novel SNLG method which uses parameter estimation models trained on personality-annotated data to predict the generation decisions required to convey any combination of scalar values along the five main dimensions of personality. A human evaluation shows that parameter estimation models produce recognizable stylistic variation along multiple dimensions, on a continuous scale, and without the computational cost incurred by overgeneration techniques.
This work was done at University of Sheffield. The author's present address is: Engineering Department, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom, E-mail: firstname.lastname@example.org.
Baskin School of Engineering, University of California, 1156 High Street, SOE-3, Santa Cruz, CA 95064, E-mail: email@example.com.