Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-5 of 5
Christopher Potts
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2025) 13: 442–460.
Published: 24 April 2025
FIGURES
| View All (4)
Abstract
View articletitled, Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
View
PDF
for article titled, Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective makes alignment a complicated procedure, sometimes producing subpar results. We study this and find that (i) preference data gives a better learning signal when the underlying responses are contrastive, and (ii) alignment objectives lead to better performance when they specify more control over the model during training. Based on these insights, we introduce Contrastive Learning from AI Revisions (CLAIR), a data-creation method which leads to more contrastive preference pairs, and Anchored Preference Optimization (APO), a controllable and more stable alignment objective. We align Llama-3-8B-Instruct using various comparable datasets and alignment objectives and measure MixEval-Hard scores, which correlate highly with human judgments. The CLAIR preferences lead to the strongest performance out of all datasets, and APO consistently outperforms less controllable objectives. Our best model, trained on 32K CLAIR preferences with APO, improves Llama-3-8B-Instruct by 7.65%, closing the gap with GPT4-turbo by 45%. Our code and datasets are available.
Journal Articles
ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2023) 11: 1719–1733.
Published: 21 December 2023
FIGURES
| View All (5)
Abstract
View articletitled, ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
View
PDF
for article titled, ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Compositional generalization benchmarks for semantic parsing seek to assess whether models can accurately compute meanings for novel sentences, but operationalize this in terms of logical form (LF) prediction. This raises the concern that semantically irrelevant details of the chosen LFs could shape model performance. We argue that this concern is realized for the COGS benchmark (Kim and Linzen, 2020 ). COGS poses generalization splits that appear impossible for present-day models, which could be taken as an indictment of those models. However, we show that the negative results trace to incidental features of COGS LFs. Converting these LFs to semantically equivalent ones and factoring out capabilities unrelated to semantic interpretation, we find that even baseline models get traction. A recent variable-free translation of COGS LFs suggests similar conclusions, but we observe this format is not semantically equivalent; it is incapable of accurately representing some COGS meanings. These findings inform our proposal for ReCOGS, a modified version of COGS that comes closer to assessing the target semantic capabilities while remaining very challenging. Overall, our results reaffirm the importance of compositional generalization and careful benchmark task design.
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2021) 9: 929–944.
Published: 08 September 2021
FIGURES
Abstract
View articletitled, Relevance-guided Supervision for OpenQA with ColBERT
View
PDF
for article titled, Relevance-guided Supervision for OpenQA with ColBERT
Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2017) 5: 325–338.
Published: 01 September 2017
Abstract
View articletitled, Colors in Context: A Pragmatic Neural Model for Grounded Language
Understanding
View
PDF
for article titled, Colors in Context: A Pragmatic Neural Model for Grounded Language
Understanding
We present a model of pragmatic referring expression interpretation in a grounded communication task (identifying colors from descriptions) that draws upon predictions from two recurrent neural network classifiers, a speaker and a listener, unified by a recursive pragmatic reasoning framework. Experiments show that this combined pragmatic model interprets color descriptions more accurately than the classifiers from which it is built, and that much of this improvement results from combining the speaker and listener perspectives. We observe that pragmatic reasoning helps primarily in the hardest cases: when the model must distinguish very similar colors, or when few utterances adequately express the target color. Our findings make use of a newly-collected corpus of human utterances in color reference games, which exhibit a variety of pragmatic behaviors. We also show that the embedded speaker model reproduces many of these pragmatic behaviors.
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2014) 2: 297–310.
Published: 01 October 2014
Abstract
View articletitled, Exploiting Social Network Structure for Person-to-Person Sentiment
Analysis
View
PDF
for article titled, Exploiting Social Network Structure for Person-to-Person Sentiment
Analysis
Person-to-person evaluations are prevalent in all kinds of discourse and important for establishing reputations, building social bonds, and shaping public opinion. Such evaluations can be analyzed separately using signed social networks and textual sentiment analysis, but this misses the rich interactions between language and social context. To capture such interactions, we develop a model that predicts individual A ’s opinion of individual B by synthesizing information from the signed social network in which A and B are embedded with sentiment analysis of the evaluative texts relating A to B . We prove that this problem is NP-hard but can be relaxed to an efficiently solvable hinge-loss Markov random field, and we show that this implementation outperforms text-only and network-only versions in two very different datasets involving community-level decision-making: the Wikipedia Requests for Adminship corpus and the Convote U.S. Congressional speech corpus.