Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-3 of 3
Oliver Lemon
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2014) 40 (4): 883–920.
Published: 01 December 2014
FIGURES
| View All (9)
Abstract
View article
PDF
We address the problem of dynamically modeling and adapting to unknown users in resource-scarce domains in the context of interactive spoken dialogue systems. As an example, we show how a system can learn to choose referring expressions to refer to domain entities for users with different levels of domain expertise, and whose domain knowledge is initially unknown to the system. We approach this problem using a three step process: collecting data using a Wizard-of-Oz method, building simulated users, and learning to model and adapt to users using Reinforcement Learning techniques. We show that by using only a small corpus of non-adaptive dialogues and user knowledge profiles it is possible to learn an adaptive user modeling policy using a sense-predict-adapt approach. Our evaluation results show that the learned user modeling and adaptation strategies performed better in terms of adaptation than some simple hand-coded baseline policies, with both simulated and real users. With real users, the learned policy produced around a 20% increase in adaptation in comparison to an adaptive hand-coded baseline. We also show that adaptation to users' domain knowledge results in improving task success (99.47% for the learned policy vs. 84.7% for a hand-coded baseline) and reducing dialogue time of the conversation (11% relative difference). We also compared the learned policy to a variety of carefully hand-crafted adaptive policies that employ the user knowledge profiles to adapt their choices of referring expressions throughout a conversation. We show that the learned policy generalises better to unseen user profiles than these hand-coded policies, while having comparable performance on known user profiles. We discuss the overall advantages of this method and how it can be extended to other levels of adaptation such as content selection and dialogue management, and to other domains where adapting to users' domain knowledge is useful, such as travel and healthcare.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2011) 37 (1): 153–196.
Published: 01 March 2011
Abstract
View article
PDF
We present a new data-driven methodology for simulation-based dialogue strategy learning, which allows us to address several problems in the field of automatic optimization of dialogue strategies: learning effective dialogue strategies when no initial data or system exists, and determining a data-driven reward function. In addition, we evaluate the result with real users, and explore how results transfer between simulated and real interactions. We use Reinforcement Learning (RL) to learn multimodal dialogue strategies by interaction with a simulated environment which is “bootstrapped” from small amounts of Wizard-of-Oz (WOZ) data. This use of WOZ data allows data-driven development of optimal strategies for domains where no working prototype is available. Using simulation-based RL allows us to find optimal policies which are not (necessarily) present in the original data. Our results show that simulation-based RL significantly outperforms the average (human wizard) strategy as learned from the data by using Supervised Learning. The bootstrapped RL-based policy gains on average 50 times more reward when tested in simulation, and almost 18 times more reward when interacting with real users. Users also subjectively rate the RL-based policy on average 10% higher. We also show that results from simulated interaction do transfer to interaction with real users, and we explicitly evaluate the stability of the data-driven reward function.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2008) 34 (4): 487–511.
Published: 01 December 2008
Abstract
View article
PDF
We propose a method for learning dialogue management policies from a fixed data set. The method addresses the challenges posed by Information State Update (ISU)-based dialogue systems, which represent the state of a dialogue as a large set of features, resulting in a very large state space and a huge policy space. To address the problem that any fixed data set will only provide information about small portions of these state and policy spaces, we propose a hybrid model that combines reinforcement learning with supervised learning. The reinforcement learning is used to optimize a measure of dialogue reward, while the supervised learning is used to restrict the learned policy to the portions of these spaces for which we have data. We also use linear function approximation to address the need to generalize from a fixed amount of data to large state spaces. To demonstrate the effectiveness of this method on this challenging task, we trained this model on the COMMUNICATOR corpus, to which we have added annotations for user actions and Information States. When tested with a user simulation trained on a different part of the same data set, our hybrid model outperforms a pure supervised learning model and a pure reinforcement learning model. It also outperforms the hand-crafted systems on the COMMUNICATOR data, according to automatic evaluation measures, improving over the average COMMUNICATOR system policy by 10%. The proposed method will improve techniques for bootstrapping and automatic optimization of dialogue management policies from limited initial data sets.