Skip Nav Destination
Close Modal
Update search
NARROW
Date
Availability
1-17 of 17
Squibs and Discussions
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2007) 33 (3): 293–303.
Published: 01 September 2007
Abstract
View article
PDF
Automatic word alignment plays a critical role in statistical machine translation. Unfortunately, the relationship between alignment quality and statistical machine translation performance has not been well understood. In the recent literature, the alignment task has frequently been decoupled from the translation task and assumptions have been made about measuring alignment quality for machine translation which, it turns out, are not justified. In particular, none of the tens of papers published over the last five years has shown that significant decreases in alignment error rate (AER) result in significant increases in translation performance. This paper explains this state of affairs and presents steps towards measuring alignment quality in a way which is predictive of statistical machine translation performance.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2007) 33 (1): 3–8.
Published: 01 March 2007
Abstract
View article
PDF
Many annotation projects have shown that the quality of manual annotations often is not as good as would be desirable for reliable data analysis. Identifying the main sources responsible for poor annotation quality must thus be a major concern. Generalizability theory is a valuable tool for this purpose, because it allows for the differentiation and detailed analysis of factors that influence annotation quality. In this article we will present basic concepts of Generalizability Theory and give an example for its application based on published data.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2006) 32 (1): 5–12.
Published: 01 March 2006
Abstract
View article
PDF
Choi, Wiemer-Hastings, and Moore (2001) proposed to use Latent Semantic Analysis (LSA) to extract semantic knowledge from corpora in order to improve the accuracy of a text segmentation algorithm. By comparing the accuracy of the very same algorithm, depending on whether or not it takes into account complementary semantic knowledge, they were able to show the benefit derived from such knowledge. In their experiments, semantic knowledge was, however, acquired from a corpus containing the texts to be segmented in the test phase. If this hyper-specificity of the LSA corpus explains the largest part of the benefit, one may wonder if it is possible to use LSA to acquire generic semantic knowledge that can be used to segment new texts. The two experiments reported here show that the presence of the test materials in the LSA corpus has an important effect, but also that the generic semantic knowledge derived from large corpora clearly improves the segmentation accuracy.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2006) 32 (1): 1–3.
Published: 01 March 2006
Abstract
View article
PDF
WordNet, a lexical database for English that is extensively used by computational linguists, has not previously distinguished hyponyms that are classes from hyponyms that are instances. This note describes an attempt to draw that distinction and proposes a simple way to incorporate the results into future versions of WordNet.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2005) 31 (3): 289–296.
Published: 01 September 2005
Abstract
View article
PDF
Agreement statistics play an important role in the evaluation of coding schemes for discourse and dialogue. Unfortunately there is a lack of understanding regarding appropriate agreement measures and how their results should be interpreted. In this article we describe the role of agreement measures and argue that only chance-corrected measures that assume a common distribution of labels for all coders are suitable for measuring agreement in reliability studies. We then provide recommendations for how reliability should be inferred from the results of agreement statistics.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2005) 31 (1): 15–24.
Published: 01 March 2005
Abstract
View article
PDF
This article challenges the received wisdom that template-based approaches to the generation of language are necessarily inferior to other approaches as regards their maintainability, linguistic well-foundedness, and quality of output. Some recent NLG systems that call themselves “template-based” will illustrate our claims.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2004) 30 (2): 227–235.
Published: 01 June 2004
Abstract
View article
PDF
In a recent article, Carrasco and Forcada (June 2002) presented two algorithms: one for incremental addition of strings to the language of a minimal, deterministic, cyclic automaton, and one for incremental removal of strings from the automaton. The first algorithm is a generalization of the “algorithm for unsorted data”—the second of the two incremental algorithms for construction of minimal, deterministic, acyclic automata presented in Daciuk et al. (2000). We show that the other algorithm in the older article—the “algorithm for sorted data”—can be generalized in a similar way. The new algorithm is faster than the algorithm for addition of strings presented in Carrasco and Forcada's article, as it handles each state only once.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2004) 30 (1): 95–101.
Published: 01 March 2004
Abstract
View article
PDF
In recent years, the kappa coefficient of agreement has become the de facto standard for evaluating intercoder agreement for tagging tasks. In this squib, we highlight issues that affect κ and that the community has largely neglected. First, we discuss the assumptions underlying different computations of the expected agreement component of κ. Second, we discuss how prevalence and bias affect the κ measure.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2003) 29 (1): 135–143.
Published: 01 March 2003
Abstract
View article
PDF
We discuss weighted deductive parsing and consider the problem of finding the derivation with the lowest weight. We show that Knuth's generalization of Dijkstra's algorithm for the shortest-path problem offers a general method to solve this problem. Our approach is modular in the sense that Knuth's algorithm is formulated independently from the weighted deduction system.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2002) 28 (4): 545–553.
Published: 01 December 2002
Abstract
View article
PDF
Much natural language processing research implicitly assumes that word meanings are fixed in a language community, but in fact there is good evidence that different people probably associate slightly different meanings with words. We summarize some evidence for this claim from the literature and from an ongoing research project, and discuss its implications for natural language generation, especially for lexical choice, that is, choosing appropriate words for a generated text.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2002) 28 (3): 389–397.
Published: 01 September 2002
Abstract
View article
PDF
Feature structures are used to convey linguistic information in a variety of linguistic formalisms. Various definitions of feature structures exist; one dimension of variation is typing: unlike untyped feature structures, typed ones associate a type with every structure and impose appropriateness constraints on the occurrences of features and on the values that they take. This work demonstrates the benefits that typing can carry even for linguistic formalisms that use untyped feature structures. We present a method for validating the consistency of (untyped) feature structure specifications by imposing a type discipline. This method facilitates a great number of compile-time checks: many possible errors can be detected before the grammar is used for parsing. We have constructed a type signature for an existing broad-coverage grammar of English and implemented a type inference algorithm that operates on the feature structure specifications in the grammar and reports incompatibilities with the signature. We have detected a large number of errors in the grammar, some of which are described in the article.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2002) 28 (1): 71–76.
Published: 01 March 2002
Abstract
View article
PDF
A data-oriented parsing or DOP model for statistical parsing associates fragments of linguistic representations with numerical weights, where these weights are estimated by normalizing the empirical frequency of each fragment in a training corpus (see Bod [1998] and references cited therein). This note observes that this estimation method is biased and inconsistent that is, the estimated distribution does not in general converge on the true distribution as the size of the training corpus increases.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2001) 27 (4): 569–577.
Published: 01 December 2001
Abstract
View article
PDF
Pronoun resolution studies compute performance inconsistently and describe results incompletely. We propose a new reporting standard that improves the exposition of individual results and the possibility for readers to compare techniques across studies. We also propose an informative new performance metric, the resolution rate, for use in addition to precision and recall.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2001) 27 (2): 277–285.
Published: 01 June 2001
Abstract
View article
PDF
Shieber's abstract parsing algorithm (Shieber 1992) for unification grammars is an extension of Earley's algorithm (Earley 1970) for context-free grammars to feature structures. In this paper, we show that, under certain conditions, Shieber's algorithm produces what we call a nonminimal derivation: a parse tree which contains additional features that are not in the licensing productions. While Shieber's definition of parse tree allows for such nonminimal derivations, we claim that they should be viewed as invalid. We describe the sources of the nonminimal derivation problem, and propose a precise definition of minimal parse tree, as well as a modification to Shieber's algorithm which ensures minimality, although at some computational cost.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2001) 27 (1): 123–131.
Published: 01 March 2001
Abstract
View article
PDF
Proper nouns form an open class, making the incompleteness of manually or automatically learned classification rules an obvious problem. The purpose of this paper is twofold: first, to suggest the use of a complementary “backup” method to increase the robustness of any hand-crafted or machine-learning-based NE tagger; and second, to explore the effectiveness of using more fine-grained evidence—namely, syntactic and semantic contextual knowledge—in classifying NEs.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2000) 26 (4): 629–637.
Published: 01 December 2000
Abstract
View article
PDF
In this paper, it is argued that “coreference” annotations, as performed in the MUC community for example, go well beyond annotation of the relation of coreference proper. As a result, it is not always clear what semantic relation these annotations are encoding. The paper discusses a number of problems with these annotations and concludes that rethinking of the coreference task is needed before the task is expanded. In particular, it suggests a division of labor whereby annotation of the coreference relation proper is separated from other tasks such as annotation of bound anaphora and of the relation between a subject and a predicative NP.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2000) 26 (2): 251–259.
Published: 01 June 2000
Abstract
View article
PDF
Some types of documents need to meet size constraints, such as fitting into a limited number of pages. This can be a difficult constraint to enforce in a pipelined natural language generation (NLG) system, because size is mostly determined by content decisions, which usually are made at the beginning of the pipeline, but size cannot be accurately measured until the document has been completely processed by the NLG system. I present experimental data on the performance of single-solution pipeline, multiple-solution pipeline, and revision-based variants of the STOP system (which produces personalized smoking-cessation leaflets) in meeting a size constraint. This shows that a multiple-solution pipeline does much better than a single-solution pipeline, and that a revision-based system does best of all.