Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-3 of 3
Xiaochang Peng
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2019) 45 (2): 339–379.
Published: 01 June 2019
FIGURES
| View All (13)
Abstract
View article
PDF
We present algorithms for extracting Hyperedge Replacement Grammar (HRG) rules from a graph along with a vertex order. Our algorithms are based on finding a tree decomposition of smallest width, relative to the vertex order, and then extracting one rule for each node in this structure. The assumption of a fixed order for the vertices of the input graph makes it possible to solve the problem in polynomial time, in contrast to the fact that the problem of finding optimal tree decompositions for a graph is NP-hard. We also present polynomial-time algorithms for parsing based on our HRGs, where the input is a vertex sequence and the output is a graph structure. The intended application of our algorithms is grammar extraction and parsing for semantic representation of natural language. We apply our algorithms to data annotated with Abstract Meaning Representations and report on the characteristics of the resulting grammars.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2019) 45 (2): 293–337.
Published: 01 June 2019
FIGURES
| View All (10)
Abstract
View article
PDF
Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS). In this application, one must decide, for example, that 123 is verbalized as one hundred twenty three in 123 pages but as one twenty three in 123 King Ave. For this task, state-of-the-art industrial systems depend heavily on hand-written language-specific grammars. We propose neural network models that treat text normalization for TTS as a sequence-to-sequence problem, in which the input is a text token in context, and the output is the verbalization of that token. We find that the most effective model, in accuracy and efficiency, is one where the sentential context is computed once and the results of that computation are combined with the computation of each token in sequence to compute the verbalization. This model allows for a great deal of flexibility in terms of representing the context, and also allows us to integrate tagging and segmentation into the process. These models perform very well overall, but occasionally they will predict wildly inappropriate verbalizations, such as reading 3 cm as three kilometers . Although rare, such verbalizations are a major issue for TTS applications. We thus use finite-state covering grammars to guide the neural models, either during training and decoding, or just during decoding, away from such “unrecoverable” errors. Such grammars can largely be learned from data.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2018) 44 (1): 85–118.
Published: 01 March 2018
FIGURES
| View All (14)
Abstract
View article
PDF
Motivated by the task of semantic parsing, we describe a transition system that generalizes standard transition-based dependency parsing techniques to generate a graph rather than a tree. Our system includes a cache with fixed size m , and we characterize the relationship between the parameter m and the class of graphs that can be produced through the graph-theoretic concept of tree decomposition. We find empirically that small cache sizes cover a high percentage of sentences in existing semantic corpora.