Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-2 of 2
Patrick Fernandes
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2024) 12: 1250–1267.
Published: 30 September 2024
FIGURES
| View all 9
Abstract
View articletitled, Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?
View
PDF
for article titled, Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?
Despite the recent success of automatic metrics for assessing translation quality, their application in evaluating the quality of machine-translated chats has been limited. Unlike more structured texts like news, chat conversations are often unstructured, short, and heavily reliant on contextual information. This poses questions about the reliability of existing sentence-level metrics in this domain as well as the role of context in assessing the translation quality. Motivated by this, we conduct a meta-evaluation of existing automatic metrics, primarily designed for structured domains such as news, to assess the quality of machine-translated chats. We find that reference-free metrics lag behind reference-based ones, especially when evaluating translation quality in out-of-English settings. We then investigate how incorporating conversational contextual information in these metrics for sentence-level evaluation affects their performance. Our findings show that augmenting neural learned metrics with contextual information helps improve correlation with human judgments in the reference-free scenario and when evaluating translations in out-of-English settings. Finally, we propose a new evaluation metric, Context-MQM , that utilizes bilingual context with a large language model (LLM) and further validate that adding context helps even for LLM-based evaluation metrics.
Journal Articles
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2023) 11: 1643–1668.
Published: 19 December 2023
FIGURES
Abstract
View articletitled, Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
View
PDF
for article titled, Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Natural language generation has witnessed significant advancements due to the training of large language models on vast internet-scale datasets. Despite these advancements, there exists a critical challenge: These models can inadvertently generate content that is toxic, inaccurate, and unhelpful, and existing automatic evaluation metrics often fall short of identifying these shortcomings. As models become more capable, human feedback is an invaluable signal for evaluating and improving models. This survey aims to provide an overview of recent research that has leveraged human feedback to improve natural language generation. First, we introduce a taxonomy distilled from existing research to categorize and organize the varied forms of feedback. Next, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using feedback or training feedback models . We also discuss existing datasets for human-feedback data collection, and concerns surrounding feedback collection. Finally, we provide an overview of the nascent field of AI feedback , which uses large language models to make judgments based on a set of principles and minimize the need for human intervention. We also release a website of this survey at feedback-gap-survey.info .