Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-16 of 16
Roi Reichart
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2025) 13: 142–166.
Published: 17 February 2025
FIGURES
| View All (21)
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2023) 11: 351–366.
Published: 20 April 2023
FIGURES
| View All (5)
Abstract
View articletitled, On the Robustness of Dialogue History Representation in
Conversational Question Answering: A Comprehensive Study and a New Prompt-based
Method
View
PDF
for article titled, On the Robustness of Dialogue History Representation in
Conversational Question Answering: A Comprehensive Study and a New Prompt-based
Method
Most work on modeling the conversation history in Conversational Question Answering (CQA) reports a single main result on a common CQA benchmark. While existing models show impressive results on CQA leaderboards, it remains unclear whether they are robust to shifts in setting (sometimes to more realistic ones), training data size (e.g., from large to small sets) and domain. In this work, we design and conduct the first large-scale robustness study of history modeling approaches for CQA. We find that high benchmark scores do not necessarily translate to strong robustness, and that various methods can perform extremely differently under different settings. Equipped with the insights from our study, we design a novel prompt-based history modeling approach and demonstrate its strong robustness across various settings. Our approach is inspired by existing methods that highlight historic answers in the passage. However, instead of highlighting by modifying the passage token embeddings, we add textual prompts directly in the passage text. Our approach is simple, easy to plug into practically any model, and highly effective, thus we recommend it as a starting point for future model developers. We also hope that our study and insights will raise awareness to the importance of robustness-focused evaluation, in addition to obtaining high leaderboard scores, leading to better CQA systems. 1
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2022) 10: 1209–1228.
Published: 07 November 2022
FIGURES
| View All (6)
Abstract
View articletitled, Multi-task Active Learning for Pre-trained Transformer-based Models
View
PDF
for article titled, Multi-task Active Learning for Pre-trained Transformer-based Models
Multi-task learning, in which several tasks are jointly learned by a single model, allows NLP models to share information from multiple annotations and may facilitate better predictions when the tasks are inter-related. This technique, however, requires annotating the same text with multiple annotation schemes, which may be costly and laborious. Active learning (AL) has been demonstrated to optimize annotation processes by iteratively selecting unlabeled examples whose annotation is most valuable for the NLP model. Yet, multi-task active learning (MT-AL) has not been applied to state-of-the-art pre-trained Transformer-based NLP models. This paper aims to close this gap. We explore various multi-task selection criteria in three realistic multi-task scenarios, reflecting different relations between the participating tasks, and demonstrate the effectiveness of multi-task compared to single-task selection. Our results suggest that MT-AL can be effectively used in order to minimize annotation efforts for multi-task NLP models. 1
Journal Articles
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2022) 10: 1138–1158.
Published: 18 October 2022
FIGURES
Abstract
View articletitled, Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
View
PDF
for article titled, Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the challenges and opportunities in the application of causal inference to the textual domain, with its unique properties. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects with text, encompassing settings where text is used as an outcome, treatment, or to address confounding. In addition, we explore potential uses of causal inference to improve the robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the NLP community. 1
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2022) 10: 414–433.
Published: 11 April 2022
FIGURES
| View All (9)
Abstract
View articletitled, PADA: Example-based Prompt Learning for on-the-fly Adaptation to
Unseen Domains
View
PDF
for article titled, PADA: Example-based Prompt Learning for on-the-fly Adaptation to
Unseen Domains
Natural Language Processing algorithms have made incredible progress, but they still struggle when applied to out-of-distribution examples. We address a challenging and underexplored version of this domain adaptation problem, where an algorithm is trained on several source domains, and then applied to examples from unseen domains that are unknown at training time. Particularly, no examples, labeled or unlabeled, or any other knowledge about the target domain are available to the algorithm at training time. We present PADA : An example-based autoregressive Prompt learning algorithm for on-the-fly Any-Domain Adaptation, based on the T5 language model. Given a test example, PADA first generates a unique prompt for it and then, conditioned on this prompt, labels the example with respect to the NLP prediction task. PADA is trained to generate a prompt that is a token sequence of unrestricted length, consisting of Domain Related Features (DRFs) that characterize each of the source domains. Intuitively, the generated prompt is a unique signature that maps the test example to a semantic space spanned by the source domains. In experiments with 3 tasks (text classification and sequence tagging), for a total of 14 multi-source adaptation scenarios, PADA substantially outperforms strong baselines. 1
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2022) 10: 307–324.
Published: 25 March 2022
FIGURES
| View All (5)
Abstract
View articletitled, Designing an Automatic Agent for Repeated Language–based Persuasion Games
View
PDF
for article titled, Designing an Automatic Agent for Repeated Language–based Persuasion Games
Persuasion games are fundamental in economics and AI research and serve as the basis for important applications. However, work on this setup assumes communication with stylized messages that do not consist of rich human language. In this paper we consider a repeated sender (expert) – receiver (decision maker) game, where the sender is fully informed about the state of the world and aims to persuade the receiver to accept a deal by sending one of several possible natural language reviews. We design an automatic expert that plays this repeated game, aiming to achieve the maximal payoff. Our expert is implemented within the Monte Carlo Tree Search (MCTS) algorithm, with deep learning models that exploit behavioral and linguistic signals in order to predict the next action of the decision maker, and the future payoff of the expert given the state of the game and a candidate review. We demonstrate the superiority of our expert over strong baselines and its adaptability to different decision makers and potential proposed deals. 1
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2021) 9: 1355–1373.
Published: 06 December 2021
FIGURES
Abstract
View articletitled, Model Compression for Domain Adaptation through Causal Effect Estimation
View
PDF
for article titled, Model Compression for Domain Adaptation through Causal Effect Estimation
Recent improvements in the predictive quality of natural language processing systems are often dependent on a substantial increase in the number of model parameters. This has led to various attempts of compressing such models, but existing methods have not considered the differences in the predictive power of various model components or in the generalizability of the compressed models. To understand the connection between model compression and out-of-distribution generalization, we define the task of compressing language representation models such that they perform best in a domain adaptation setting. We choose to address this problem from a causal perspective, attempting to estimate the average treatment effect (ATE) of a model component, such as a single layer, on the model’s predictions. Our proposed ATE-guided Model Compression scheme ( AMoC ), generates many model candidates, differing by the model components that were removed. Then, we select the best candidate through a stepwise regression model that utilizes the ATE to predict the expected performance on the target domain. AMoC outperforms strong baselines on dozens of domain pairs across three text classification and sequence tagging tasks. 1
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2021) 9: 410–428.
Published: 26 April 2021
FIGURES
| View All (4)
Abstract
View articletitled, Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages
View
PDF
for article titled, Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages
Most combinations of NLP tasks and language varieties lack in-domain examples for supervised training because of the paucity of annotated data. How can neural models make sample-efficient generalizations from task–language combinations with available data to low-resource ones? In this work, we propose a Bayesian generative model for the space of neural parameters. We assume that this space can be factorized into latent variables for each language and each task. We infer the posteriors over such latent variables based on data from seen task–language combinations through variational inference. This enables zero-shot classification on unseen combinations at prediction time. For instance, given training data for named entity recognition (NER) in Vietnamese and for part-of-speech (POS) tagging in Wolof, our model can perform accurate predictions for NER in Wolof. In particular, we experiment with a typologically diverse sample of 33 languages from 4 continents and 11 families, and show that our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods. Our code is available at github.com/cambridgeltl/parameter-factorization .
Journal Articles
PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2020) 8: 504–521.
Published: 01 July 2020
FIGURES
| View All (5)
Abstract
View articletitled, PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models
View
PDF
for article titled, PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models
Pivot-based neural representation models have led to significant progress in domain adaptation for NLP. However, previous research following this approach utilize only labeled data from the source domain and unlabeled data from the source and target domains, but neglect to incorporate massive unlabeled corpora that are not necessarily drawn from these domains. To alleviate this, we propose PERL : A representation learning model that extends contextualized word embedding models such as BERT (Devlin et al., 2019 ) with pivot-based fine-tuning. PERL outperforms strong baselines across 22 sentiment classification domain adaptation setups, improves in-domain model performance, yields effective reduced-size models, and increases model stability. 1
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2019) 7: 695–713.
Published: 01 December 2019
FIGURES
| View All (5)
Abstract
View articletitled, Deep Contextualized Self-training for Low Resource Dependency
Parsing
View
PDF
for article titled, Deep Contextualized Self-training for Low Resource Dependency
Parsing
Neural dependency parsing has proven very effective, achieving state-of-the-art results on numerous domains and languages. Unfortunately, it requires large amounts of labeled data, which is costly and laborious to create. In this paper we propose a self-training algorithm that alleviates this annotation bottleneck by training a parser on its own output. Our Deep Contextualized Self-training (DCST) algorithm utilizes representation models trained on sequence labeling tasks that are derived from the parser’s output when applied to unlabeled data, and integrates these models with the base parser through a gating mechanism. We conduct experiments across multiple languages, both in low resource in-domain and in cross-domain setups, and demonstrate that DCST substantially outperforms traditional self-training as well as recent semi-supervised training methods. 1
Journal Articles
Perturbation Based Learning for Structured NLP Tasks with Application to Dependency Parsing
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2019) 7: 643–659.
Published: 01 September 2019
FIGURES
Abstract
View articletitled, Perturbation Based Learning for Structured NLP Tasks with Application to Dependency Parsing
View
PDF
for article titled, Perturbation Based Learning for Structured NLP Tasks with Application to Dependency Parsing
The best solution of structured prediction models in NLP is often inaccurate because of limited expressive power of the model or to non-exact parameter estimation. One way to mitigate this problem is sampling candidate solutions from the model’s solution space, reasoning that effective exploration of this space should yield high-quality solutions. Unfortunately, sampling is often computationally hard and many works hence back-off to sub-optimal strategies, such as extraction of the best scoring solutions of the model, which are not as diverse as sampled solutions. In this paper we propose a perturbation-based approach where sampling from a probabilistic model is computationally efficient. We present a learning algorithm for the variance of the perturbations, and empirically demonstrate its importance. Moreover, while finding the argmax in our model is intractable, we propose an efficient and effective approximation. We apply our framework to cross-lingual dependency parsing across 72 corpora from 42 languages and to lightly supervised dependency parsing across 13 corpora from 12 languages, and demonstrate strong results in terms of both the quality of the entire solution list and of the final solution. 1
Journal Articles
Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2018) 6: 451–465.
Published: 01 July 2018
Abstract
View articletitled, Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction
View
PDF
for article titled, Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction
Neural architectures are prominent in the construction of language models (LMs). However, word-level prediction is typically agnostic of subword-level information (characters and character sequences) and operates over a closed vocabulary, consisting of a limited word set. Indeed, while subword-aware models boost performance across a variety of NLP tasks, previous work did not evaluate the ability of these models to assist next-word prediction in language modeling tasks. Such subword-level informed models should be particularly effective for morphologically-rich languages (MRLs) that exhibit high type-to-token ratios. In this work, we present a large-scale LM study on 50 typologically diverse languages covering a wide variety of morphological systems, and offer new LM benchmarks to the community, while considering subword-level information. The main technical contribution of our work is a novel method for injecting subword-level information into semantic word vectors, integrated into the neural language modeling training, to facilitate word-level prediction. We conduct experiments in the LM setting where the number of infrequent words is large, and demonstrate strong perplexity gains across our 50 languages, especially for morphologically-rich languages. Our code and data sets are publicly available.
Journal Articles
Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2017) 5: 471–486.
Published: 01 November 2017
Abstract
View articletitled, Replicability Analysis for Natural Language Processing: Testing
Significance with Multiple Datasets
View
PDF
for article titled, Replicability Analysis for Natural Language Processing: Testing
Significance with Multiple Datasets
With the ever growing amount of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure a consistent performance across heterogeneous setups. However, such multiple comparisons pose significant challenges to traditional statistical analysis methods in NLP and can lead to erroneous conclusions. In this paper we propose a Replicability Analysis framework for a statistically sound analysis of multiple comparisons between algorithms for NLP tasks. We discuss the theoretical advantages of this framework over the current, statistically unjustified, practice in the NLP literature, and demonstrate its empirical value across four applications: multi-domain dependency parsing, multilingual POS tagging, cross-domain sentiment classification and word similarity prediction.
Journal Articles
Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
Open AccessPublisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2017) 5: 309–324.
Published: 01 September 2017
Abstract
View articletitled, Semantic Specialization of Distributional Word Vector Spaces using
Monolingual and Cross-Lingual Constraints
View
PDF
for article titled, Semantic Specialization of Distributional Word Vector Spaces using
Monolingual and Cross-Lingual Constraints
We present A ttract -R epel , an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources. A ttract -R epel facilitates the use of constraints from mono- and cross-lingual resources, yielding semantically specialized cross-lingual vector spaces. Our evaluation shows that the method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high- to lower-resource ones. The effectiveness of our approach is demonstrated with state-of-the-art results on semantic similarity datasets in six languages. We next show that A ttract -R epel -specialized vectors boost performance in the downstream task of dialogue state tracking (DST) across multiple languages. Finally, we show that cross-lingual vector spaces produced by our algorithm facilitate the training of multilingual DST models, which brings further performance improvements.
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2015) 3: 131–143.
Published: 01 March 2015
Abstract
View articletitled, Unsupervised Declarative Knowledge Induction for Constraint-Based
Learning of Information Structure in Scientific Documents
View
PDF
for article titled, Unsupervised Declarative Knowledge Induction for Constraint-Based
Learning of Information Structure in Scientific Documents
Inferring the information structure of scientific documents is useful for many NLP applications. Existing approaches to this task require substantial human effort. We propose a framework for constraint learning that reduces human involvement considerably. Our model uses topic models to identify latent topics and their key linguistic features in input documents, induces constraints from this information and maps sentences to their dominant information structure categories through a constrained unsupervised model. When the induced constraints are combined with a fully unsupervised model, the resulting model challenges existing lightly supervised feature-based models as well as unsupervised models that use manually constructed declarative knowledge. Our results demonstrate that useful declarative knowledge can be learned from data with very limited human involvement.
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2014) 2: 285–296.
Published: 01 October 2014
Abstract
View articletitled, Multi-Modal Models for Concrete and Abstract Concept
Meaning
View
PDF
for article titled, Multi-Modal Models for Concrete and Abstract Concept
Meaning
Multi-modal models that learn semantic representations from both linguistic and perceptual input outperform language-only models on a range of evaluations, and better reflect human concept acquisition. Most perceptual input to such models corresponds to concrete noun concepts and the superiority of the multi-modal approach has only been established when evaluating on such concepts. We therefore investigate which concepts can be effectively learned by multi-modal models. We show that concreteness determines both which linguistic features are most informative and the impact of perceptual input in such models. We then introduce ridge regression as a means of propagating perceptual information from concrete nouns to more abstract concepts that is more robust than previous approaches. Finally, we present weighted gram matrix combination , a means of combining representations from distinct modalities that outperforms alternatives when both modalities are sufficiently rich.