Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-6 of 6
Roberto Navigli
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Computational Linguistics 1–71.
Published: 12 March 2025
Abstract
View articletitled, DiBiMT: A Gold Evaluation Benchmark for Studying Lexical Ambiguity in Machine Translation
View
PDF
for article titled, DiBiMT: A Gold Evaluation Benchmark for Studying Lexical Ambiguity in Machine Translation
Despite the remarkable progress made in the field of Machine Translation (MT), current systems still struggle when translating ambiguous words, especially when these express infrequent meanings. In order to investigate and analyze the impact of lexical ambiguity on automatic translations, several tasks and evaluation benchmarks have been proposed over the course of the last few years. However, work in this research direction suffers from critical shortcomings. Indeed, existing evaluation datasets are not entirely manually curated, which significantly compromises their reliability. Furthermore, current literature fails to provide detailed insights into the nature of the errors produced by models translating ambiguous words, lacking a thorough manual analysis across languages. With a view to overcoming these limitations, we propose Disambiguation Biases in MT (D i B i MT), an entirely manually curated evaluation benchmark for investigating disambiguation biases in eight language combinations and assessing the ability of both commercial and non-commercial systems to handle ambiguous words. We also examine and detail the errors produced by models in this scenario by carrying out a manual error analysis in all language pairs. Additionally, we perform an extensive array of experiments aimed at studying the behavior of models when dealing with ambiguous words. Finally, we show the ineffectiveness of standard MT evaluation settings for assessing the disambiguation capabilities of systems and highlight the need for additional efforts in this research direction and ad-hoc testbeds such as D i B i MT. Our benchmark is available at: https://nlp.uniroma1.it/dibimt/ .
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2014) 40 (4): 837–881.
Published: 01 December 2014
FIGURES
| View All (8)
Abstract
View articletitled, A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation
View
PDF
for article titled, A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation
The evaluation of several tasks in lexical semantics is often limited by the lack of large amounts of manual annotations, not only for training purposes, but also for testing purposes. Word Sense Disambiguation (WSD) is a case in point, as hand-labeled datasets are particularly hard and time-consuming to create. Consequently, evaluations tend to be performed on a small scale, which does not allow for in-depth analysis of the factors that determine a systems' performance. In this paper we address this issue by means of a realistic simulation of large-scale evaluation for the WSD task. We do this by providing two main contributions: First, we put forward two novel approaches to the wide-coverage generation of semantically aware pseudowords (i.e., artificial words capable of modeling real polysemous words); second, we leverage the most suitable type of pseudoword to create large pseudosense-annotated corpora, which enable a large-scale experimental framework for the comparison of state-of-the-art supervised and knowledge-based algorithms. Using this framework, we study the impact of supervision and knowledge on the two major disambiguation paradigms and perform an in-depth analysis of the factors which affect their performance.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2013) 39 (3): 665–707.
Published: 01 September 2013
FIGURES
| View All (13)
Abstract
View articletitled, OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction
View
PDF
for article titled, OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction
In 2004 we published in this journal an article describing OntoLearn, one of the first systems to automatically induce a taxonomy from documents and Web sites. Since then, OntoLearn has continued to be an active area of research in our group and has become a reference work within the community. In this paper we describe our next-generation taxonomy learning methodology, which we name OntoLearn Reloaded. Unlike many taxonomy learning approaches in the literature, our novel algorithm learns both concepts and relations entirely from scratch via the automated extraction of terms, definitions, and hypernyms. This results in a very dense, cyclic and potentially disconnected hypernym graph. The algorithm then induces a taxonomy from this graph via optimal branching and a novel weighting policy. Our experiments show that we obtain high-quality results, both when building brand-new taxonomies and when reconstructing sub-hierarchies of existing taxonomies.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2013) 39 (3): 709–754.
Published: 01 September 2013
FIGURES
| View All (7)
Abstract
View articletitled, Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction
View
PDF
for article titled, Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction
Web search result clustering aims to facilitate information search on the Web. Rather than the results of a query being presented as a flat list, they are grouped on the basis of their similarity and subsequently shown to the user as a list of clusters. Each cluster is intended to represent a different meaning of the input query, thus taking into account the lexical ambiguity (i.e., polysemy) issue. Existing Web clustering methods typically rely on some shallow notion of textual similarity between search result snippets, however. As a result, text snippets with no word in common tend to be clustered separately even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query. In this article we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction. Key to our approach is to first acquire the various senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on data sets of ambiguous queries, show that our approach outperforms both Web clustering and search engines.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2006) 32 (2): 273–281.
Published: 01 June 2006
Abstract
View articletitled, Consistent Validation of Manual and Automatic Sense Annotations with the Aid of Semantic Graphs
View
PDF
for article titled, Consistent Validation of Manual and Automatic Sense Annotations with the Aid of Semantic Graphs
The task of annotating texts with senses from a computational lexicon is widely recognized to be complex and often subjective. Although strategies like interannotator agreement and voting can be applied to deal with the divergences between sense taggers, the consistency of sense choices with respect to the reference dictionary is not always guaranteed. In this article, we introduce Valido, a visual tool for the validation of manual and automatic sense annotations. The tool employs semantic interconnection patterns to smooth possible divergences and support consistent decision making.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2004) 30 (2): 151–179.
Published: 01 June 2004
Abstract
View articletitled, Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites
View
PDF
for article titled, Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites
We present a method and a tool, OntoLearn, aimed at the extraction of domain ontologies from Web sites, and more generally from documents shared among the members of virtual organizations. OntoLearn first extracts a domain terminology from available documents. Then, complex domain terms are semantically interpreted and arranged in a hierarchical fashion. Finally, a general-purpose ontology, WordNet, is trimmed and enriched with the detected domain concepts. The major novel aspect of this approach is semantic interpretation, that is, the association of a complex concept with a complex term. This involves finding the appropriate WordNet concept for each word of a terminological string and the appropriate conceptual relations that hold among the concept components. Semantic interpretation is based on a new word sense disambiguation algorithm, called structural semantic interconnections.