Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-3 of 3
Shay B. Cohen
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2021) 47 (2): 445–476.
Published: 13 July 2021
FIGURES
| View All (17)
Abstract
View article
PDF
We consider the task of crosslingual semantic parsing in the style of Discourse Representation Theory (DRT) where knowledge from annotated corpora in a resource-rich language is transferred via bitext to guide learning in other languages. We introduce 𝕌niversal Discourse Representation Theory (𝕌DRT), a variant of DRT that explicitly anchors semantic representations to tokens in the linguistic input. We develop a semantic parsing framework based on the Transformer architecture and utilize it to obtain semantic resources in multiple languages following two learning schemes. The many-to-one approach translates non-English text to English, and then runs a relatively accurate English parser on the translated text, while the one-to-many approach translates gold standard English to non-English text and trains multiple parsers (one per language) on the translations. Experimental results on the Parallel Meaning Bank show that our proposal outperforms strong baselines by a wide margin and can be used to construct (silver-standard) meaning banks for 99 languages.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2016) 42 (3): 421–455.
Published: 01 September 2016
FIGURES
| View All (10)
Abstract
View article
PDF
We describe a recognition algorithm for a subset of binary linear context-free rewriting systems (LCFRS) with running time O ( n ωd ) where M ( m ) = O ( m ω ) is the running time for m × m matrix multiplication and d is the “contact rank” of the LCFRS—the maximal number of combination and non-combination points that appear in the grammar rules. We also show that this algorithm can be used as a subroutine to obtain a recognition algorithm for general binary LCFRS with running time O ( n ωd +1 ). The currently best known ω is smaller than 2.38. Our result provides another proof for the best known result for parsing mildly context-sensitive formalisms such as combinatory categorial grammars, head grammars, linear indexed grammars, and tree-adjoining grammars, which can be parsed in time O ( n 4.76 ). It also shows that inversion transduction grammars can be parsed in time O ( n 5.76 ). In addition, binary LCFRS subsumes many other formalisms and types of grammars, for some of which we also improve the asymptotic complexity of parsing.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2012) 38 (3): 479–526.
Published: 01 September 2012
Abstract
View article
PDF
Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk.