Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-9 of 9
Shuly Wintner
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2014) 40 (2): 449–468.
Published: 01 June 2014
FIGURES
Abstract
View article
PDF
We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2013) 39 (4): 999–1023.
Published: 01 December 2013
FIGURES
Abstract
View article
PDF
Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is consequently ignored. Much research in Translation Studies indicates that the direction of translation matters, however, as translated language ( translationese ) has many unique properties. It has already been shown that phrase tables constructed from parallel corpora translated in the same direction as the translation task outperform those constructed from corpora translated in the opposite direction. We reconfirm that this is indeed the case, but emphasize the importance of also using texts translated in the “wrong” direction. We take advantage of information pertaining to the direction of translation in constructing phrase tables by adapting the translation model to the special properties of translationese. We explore two adaptation techniques: First, we create a mixture model by interpolating phrase tables trained on texts translated in the “right” and the “wrong” directions. The weights for the interpolation are determined by minimizing perplexity. Second, we define entropy-based measures that estimate the correspondence of target-language phrases to translationese, thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation. We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent, statistically significant improvement in the quality of the translation.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2012) 38 (4): 799–825.
Published: 01 December 2012
Abstract
View article
PDF
We investigate the differences between language models compiled from original target-language texts and those compiled from texts manually translated to the target language. Corroborating established observations of Translation Studies, we demonstrate that the latter are significantly better predictors of translated sentences than the former, and hence fit the reference set better. Furthermore, translated texts yield better language models for statistical machine translation than original texts.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2011) 37 (1): 29–74.
Published: 01 March 2011
Abstract
View article
PDF
Development of large-scale grammars for natural languages is a complicated endeavor: Grammars are developed collaboratively by teams of linguists, computational linguists, and computer scientists, in a process very similar to the development of large-scale software. Grammars are written in grammatical formalisms that resemble very-high-level programming languages, and are thus very similar to computer programs. Yet grammar engineering is still in its infancy: Few grammar development environments support sophisticated modularized grammar development, in the form of distribution of the grammar development effort, combination of sub-grammars, separate compilation and automatic linkage, information encapsulation, and so forth. This work provides preliminary foundations for modular construction of (typed) unification grammars for natural languages. Much of the information in such formalisms is encoded by the type signature, and we subsequently address the problem through the distribution of the signature among the different modules. We define signature modules and provide operators of module combination . Modules may specify only partial information about the components of the signature and may communicate through parameters, similarly to function calls in programming languages. Our definitions are inspired by methods and techniques of programming language theory and software engineering and are motivated by the actual needs of grammar developers, obtained through a careful examination of existing grammars. We show that our definitions meet these needs by conforming to a detailed set of desiderata. We demonstrate the utility of our definitions by providing a modular design of the HPSG grammar of Pollard and Sag.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2009) 35 (4): 641–644.
Published: 01 December 2009
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2008) 34 (3): 429–448.
Published: 01 September 2008
Abstract
View article
PDF
Words in Semitic languages are formed by combining two morphemes: a root and a pattern. The root consists of consonants only, by default three, and the pattern is a combination of vowels and consonants, with non-consecutive “slots” into which the root consonants are inserted. Identifying the root of a given word is an important task, considered to be an essential part of the morphological analysis of Semitic languages, and information on roots is important for linguistics research as well as for practical applications. We present a machine learning approach, augmented by limited linguistic knowledge, to the problem of identifying the roots of Semitic words. Although programs exist which can extract the root of words in Arabic and Hebrew, they are all dependent on labor-intensive construction of large-scale lexicons which are components of full-scale morphological analyzers. The advantage of our method is an automation of this process, avoiding the bottleneck of having to laboriously list the root and pattern of each lexeme in the language. To the best of our knowledge, this is the first application of machine learning to this problem, and one of the few attempts to directly address non-concatenative morphology using machine learning. More generally, our results shed light on the problem of combining classifiers under (linguistically motivated) constraints.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2006) 32 (1): 49–82.
Published: 01 March 2006
Abstract
View article
PDF
We introduce finite-state registered automata (FSRAs), a new computational device within the framework of finite-state technology, specifically tailored for implementing non-concatenative morphological processes. This model extends and augments existing finite-state techniques, which are presently not optimized for describing this kind of phenomena. We first define the model and discuss its mathematical and computational properties. Then, we provide an extended regular language whose expressions denote FSRAs. Finally, we exemplify the utility of the model by providing several examples of complex morphological and phonological phenomena, which are elegantly implemented with FSRAs.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2004) 30 (2): 237–239.
Published: 01 June 2004
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2002) 28 (3): 389–397.
Published: 01 September 2002
Abstract
View article
PDF
Feature structures are used to convey linguistic information in a variety of linguistic formalisms. Various definitions of feature structures exist; one dimension of variation is typing: unlike untyped feature structures, typed ones associate a type with every structure and impose appropriateness constraints on the occurrences of features and on the values that they take. This work demonstrates the benefits that typing can carry even for linguistic formalisms that use untyped feature structures. We present a method for validating the consistency of (untyped) feature structure specifications by imposing a type discipline. This method facilitates a great number of compile-time checks: many possible errors can be detected before the grammar is used for parsing. We have constructed a type signature for an existing broad-coverage grammar of English and implemented a type inference algorithm that operates on the feature structure specifications in the grammar and reports incompatibilities with the signature. We have detected a large number of errors in the grammar, some of which are described in the article.