Abstract
The formalism for Lexical-Functional Grammar (LFG) was introduced in the 1980s as one of the first constraint-based grammatical formalisms for natural language. It has led to substantial contributions to the linguistic literature and to the construction of large-scale descriptions of particular languages. Investigations of its mathematical properties have shown that, without further restrictions, the recognition, emptiness, and generation problems are undecidable, and that they are intractable in the worst case even with commonly applied restrictions. However, grammars of real languages appear not to invoke the full expressive power of the formalism, as indicated by the fact that algorithms and implementations for recognition and generation have been developed that run—even for broad-coverage grammars—in typically polynomial time. This article formalizes some restrictions on the notation and its interpretation that are compatible with conventions and principles that have been implicit or informally stated in linguistic theory. We show that LFG grammars that respect these restrictions, while still suitable for the description of natural languages, are equivalent to linear context-free rewriting systems and allow for tractable computation.
1. Introduction
Grammar formalisms are attractive for modeling and processing natural language if they are expressive enough to describe the structure of natural languages and if they are computationally tractable. While the first requirement is certainly satisfied by the majority of unification-based grammar formalisms, the second is certainly not. Computational performance is minimally identified with the complexity of the universal recognition/parsing problem—that is, the problem of determining for any given grammar G and terminal string s whether G assigns a feature structure to s. Because a problem is computationally tractable if it has a polynomial-time solution, formalisms are thus considered tractable if their recognition/parsing problem can be solved in polynomial time.
For almost all unrestricted unification-based grammar formalisms, the recognition problem has been known to be undecidable since the earliest days of unification grammar (see, e.g., Kaplan and Bresnan 1982; Johnson 1988; Blackburn and Spaan 1993). To sidestep this undecidability issue in the design of Lexical-Functional Grammar (LFG), Kaplan and Bresnan (1982) introduced a constraint, later called the Off-line Parsability Constraint, that guarantees decidability of the recognition problem. The Off-line Parsability Constraint proscribes empty productions and nonbranching dominance chains and thus bounds the number and size of the c-structures of a string by a function of the length of that string. This works for recognition/parsing because the size of the set of constraints on each f(eature)-structure (the f-description) of a string is always bounded by the size of its c-structures. The parsing problem is decidable with offline parsability, but that constraint is not sufficient to ensure tractability: The parsing problem for off-line parsable LFG grammars is known to be NP-complete (Berwick 1982).
The universal generation problem is another problem of practical importance. This is the problem of determining for an arbitrary grammar G and an arbitrary f-structure F whether G derives any terminal string with F. This has been shown to be undecidable even for grammars that are off-line parsable (Wedekind 2014).1 Note further that the emptiness problem (the problem of determining for an arbitrary grammar G whether L(G) = ∅) is also undecidable for off-line parsable grammars.2 This problem is of formal interest but perhaps has less practical significance.
A number of variants of the Off-line Parsability Constraint have been proposed since Kaplan and Bresnan’s (1982) original formulation (see, e.g., Jaeger et al. [2005] for a survey). However, none of them is particularly effective in improving the computational properties of LFG and other unification-based formalisms. We therefore here ignore Off-line Parsability constraints, even though some of them may help in increasing efficiency and may also be required for linguistic interpretation: They are not sufficient for attaining recognition tractability and decidability of the generation and emptiness problems and they are not necessary given the solution that this article identifies.
A separate line of research has examined whether there are restrictions on the form of LFG rule annotations that might reduce the complexity class of the formalism to a system with tractable recognition and generation algorithms. In particular, Seki et al. (1993) consider LFG grammars where each right-hand side category is annotated with exactly one function-assigning schema of the form (↑ f) = ↓ and finitely many atomic-valued annotations of the form (↑ a) = v. They show that the class of languages generated by these LFG grammars is equal to that generated by tree-to-string finite-state translation systems. Because the recognition problem for tree-to-string finite-state translation systems is still NP-complete, Seki et al. (1993) considered in addition the condition that the number of nodes that relate to the same f-structure element is finitely bounded. This restriction limits the number of dependent c-structure paths and finally ensures that the resulting “finite-copying” LFG grammars are mildly context-sensitive and hence tractable. Vijay-Shanker et al. (1987) and Weir (1988) identified several representatives of this class, among them linear context-free rewriting systems (LCFRSs). These have subsequently been shown to be equivalent to multiple context-free grammars (Seki et al. 1991) and simple range concatenation grammars (Boullier 2000) (see also Kallmeyer [2010b] for a useful survey of mildly context-sensitive grammar formalisms).
These tractability-motivated restrictions on the formalism are quite severe and seem to exclude the possibility of expressing some of the key principles and devices of LFG theory. For example, among the common annotations that these restrictions disallow are the trivial ↑ = ↓ equations that mark the heads of constituents, multi-attribute value specifications, such as (↑ subjnum) = sg, that encode agreement requirements, and reentrancy equations, such as (↑ xcompsubj) = (↑ subj), that represent functional control. However, these restrictions provide a useful starting point for identifying other formal properties that guarantee tractability but still allow linguistic generalizations to be stated perspicuously. Natural language grammars exploit more of the LFG formalism than the Seki et al. analysis would allow, but they do not make arbitrary use of its expressive power.
The computational advantage of these previous approaches depended on severely limiting the form of functional annotations and how they are distributed across the rules of a grammar. Here, we relax the restrictions on notation to allow a broader range of annotations as are more typically used in linguistic descriptions. We show that tractability and decidability can still be established with this linguistically more suitable notation if the propagation of atomic-value information across a derivation is regulated in other ways. We characterize a subclass of LFG grammars with properties that guarantee limitations on the flow of information and yet also seem compatible with the way that the LFG formalism is deployed in linguistic practice. In this article we focus on the descriptive devices as originally proposed by Kaplan and Bresnan (1982) and still in common use, leaving to later work the more sophisticated extensions to the formalism (e.g., functional uncertainty) that were later introduced and are now in common use. The Kaplan-Bresnan formalism is still the core of the theory and resolving its computational issues is a prerequisite for the future analysis of any additional mechanisms.
The organization of this article is as follows. In the next section we define a primitive subclass of LFG grammars, the basic LFGs, and characterize their derivations and languages. Basic LFGs are stripped down versions of grammars in the Kaplan and Bresnan (1982) formalism that retain the essential equational mechanism of this and other unification-based formalisms. They provide a simplified setting for establishing the properties crucial for controlling the flow of atomic-value information; the additional Kaplan-Bresnan formal devices are simple augmentations of this basic machinery. In Section 3 we introduce a subclass of basic LFG grammars that allow for trivial annotations and reentrancies of a particularly limited form. These are required for linguistic description but have unacceptable formal and computational properties. We therefore review the notational and derivational restrictions of the finite-copying subclass (Seki et al. 1993), which is known to be LCFRS equivalent and computationally tractable, and we argue in Section 4 that a Seki et al.-style boundedness condition must be paired with a nonconstructivity condition on the reentrancies to achieve LCFRS equivalence and thus to preserve the key advantages of Seki et al.’s strictly less expressive formalism. We claim that these restrictions are compatible with linguistic description and show their decidability. We then provide in Section 5 a constructive proof of the equivalence between grammars in this linguistically relevant subclass and LCFRSs, and we demonstrate that the LCFRS equivalence extends to LFG grammars that make use of the richer set of descriptive devices originally proposed by Kaplan and Bresnan (1982). Section 6 highlights the beneficial effects on computational performance of a broader set of LFG principles and explores possible implementation difficulties and alternative parsing strategies. The last section summarizes our results and provides a brief discussion of open issues.
2. Preliminaries
We start with a formal characterization of LFG grammars with only equational statements and without any parsability constraints. These are called basic LFGs. We define the notation for this broad class of grammars and specify how that notation is interpreted in derivations that establish the correspondence between strings, functional descriptions, and functional structures. Let Σ* denote the set of all finite strings over some finite set of symbols Σ and Σ+ = Σ*∖{ϵ}, with ϵ denoting the empty string. Then a basic LFG grammar G is defined as follows:
Definition 1
A basic LFG assigns to each sentence in its language at least one constituent structure (c-structure) and at least one f-structure. The f-structure is the minimal solution for a set of instantiated annotations (f-description) that is obtained by instantiating the annotations of rules that license the local mother–daughter configurations of the c-structure. For each such rule, all occurrences of the ↑ symbol (called a metavariable) in the annotations of the daughters are replaced by the mother node, and for each of the daughter categories, all occurrences of the ↓ metavariable in its annotations are replaced by the corresponding daughter node. This instantiation procedure uses the nodes themselves instead of the related f-structure variables of Kaplan and Bresnan (1982) or the more explicit ϕ projection of the LFG correspondence architecture (Kaplan 1995). (Node instantiation is mathematically equivalent to the other representations but is better suited to present purposes.) The f-structure is the unique minimal solution (minimal model) of the f-description.
To be more explicit, instantiated descriptions are obtained from the rules by substituting for the ↑ and ↓ metavariables elements drawn from a collection of elements. C-structure nodes are included in the collection, but later on we also make use of other elements. We define a function Inst that assigns to each m-ary rule r, element b, and sequence of elements β of length m the instantiated description that is obtained from the annotations of r by substituting b for ↑ and βj for ↓ in the annotations of all j = 1, .., m daughters. In the definition below we use the (more compact) linear rule notation A → (X1, D1)..(Xm, Dm) that we prefer in more formal specifications. (Using this notation, epsilon rules are defined as unary rules (m = 1) with X1 = ϵ.) Moreover, for a set of formulas D we let D[c1/b1, .., cn/bn] denote the set of formulas obtained from D by simultaneously replacing all occurrences of ci by the corresponding element bi, for i = 1, .., n.
Definition 2
Before we present the usual definition of LFG derivations, we first define derivations of strings and their instantiated functional descriptions without requiring the f-descriptions to be clash-free (consistent), that is, without requiring that there be an associated f-structure. These unevaluated derivations are useful for our purposes because we can exploit certain properties of the f-descriptions even for grammar classes whose recognition, emptiness, and generation problems are known to be undecidable. In the definitions, we do not require the label of the root node to be S nor the yield to be a terminal string. This generality is needed for conducting inductive proofs both top–down and bottom–up. We further assume that root is the root node of any c-structure c, and that dts is a function that assigns to each nonterminal node n of c the sequence of its immediate daughters.
Definition 3
A pair (c, ρ) consisting of a labeled tree c and a mapping ρ from the nonterminal nodes of c into R is a(n unevaluated) derivation of string s ∈ (N ∪ T)* from Bwith functional descriptionFD in G iff
- (i)
the label of root is B,
- (ii)
the yield is s,
- (iii)
for each nonterminal node n with label A and dts(n) = n1..nm with labels X1, .., Xm, respectively, ρn = A → (X1, D1)..(Xm, Dm),
- (iv)
FD = Inst(ρn, (n, dts(n))).
Definition 4
A functional description FD is clash-free iff
- (i)
FD ⊬ a = v, where v ∈ 𝒱 and a is any other constant (atomic feature value or node) occurring in FD,
- (ii)
FD ⊬ t = (v σ), where v ∈ 𝒱, σ ∈ 𝒜+, and t is an arbitrary term.
LFG derivations are then defined as follows.
Definition 5
A pair (c, ρ) consisting of a labeled tree c and a mapping ρ from the nonterminal nodes of c into R is a derivation of string s ∈ (N ∪ T)* from Bwith functional descriptionFDand f-structureF in G iff
- (i)
(c, ρ) is a derivation of s from B with f-description FD in G,
- (ii)
FD is clash-free,
- (iii)
F is isomorphic to M|𝒜∪𝒱, the restriction to 𝒜 ∪ 𝒱 of a minimal model M of FD.
The effect of condition (iii) is to abstract away from the nodes and their interpretation, and the universe of the given minimal model that have no linguistic significance. In contrast to the unevaluated derivations of Definition 3, we refer to derivations as defined in Definition 5 sometimes as valid derivations, although it will always be clear from the context which of the two notions is meant.
Finally, we define LFG’s derivability relation Δ and the language of an LFG grammar as follows.
Definition 6
As an example consider the LFG grammar G with the productions given in (1).
This grammar derives the language {nn vn ∣ n > 2}. It mimics the LFG grammar for Dutch cross-serial dependencies presented in Bresnan et al. (1982) in that it produces for a string nn vn an f-structure that is structurally equivalent to the f-structure assigned to a Dutch c-structure with preterminal string NPn Vn. The attributes s, p, o, and x correspond to subj, pred, obj, and vcomp, respectively, of the original grammar, and a is a generic subject–verb agreement feature. However, in contrast to Bresnan et al.’s grammar, we use a more fine-grained category system and a boundary marker # to ensure that the different v’s are derived at the right positions and that the number of n’s and v’s properly match. This is just because the basic formalism does not include constraints and devices for enforcing the valency requirements of the individual predicates, the Completeness and Coherence Conditions, that usually account for that. Finally, for simplicity, we do not maintain the strict distinction between phrasal and lexical categories.
The terminal string nnnvvv, for example, is well-formed (nnnvvv ∈ L(G)), because it is derivable with the c- and f-structure depicted in Figure 1. The nonterminal nodes of the c-structure tree are explicitly specified since they are used for the instantiation of the rule annotations. The rule mapping ρ that justifies the c-structure is given in (2).
The f-structure of Figure 1 is associated with the given c-structure because it is obtained from the minimal model (3b) of the f-description (3a) by restricting the interpretation function to the attributes and atomic feature values in 𝒜 ∪ 𝒱, thus disregarding the nodes and their interpretation.
3. Linguistically Motivated Annotations and Finite-Copying LFGs
In this section we consider a proper subclass of basic LFGs that provides the primitive notation needed for linguistic analysis and we demonstrate that even for these notationally more restricted grammars the core computational problems are undecidable. We then introduce Seki et al.’s finite-copying grammars. These grammars are considerably more restricted in notation and their derivations are crucially required to satisfy a strong bounding condition. As indicated, finite-copying grammars admit of decidable and tractable solutions to key computational problems, but their limited expressivity makes them unsuited for theoretically motivated descriptions of natural languages.
3.1 Linguistically Motivated Annotations
The LFG grammars that we will consider in the following are characterized by short reentrancies with no more than two attributes. This respects the theoretical Principle of Functional Locality, the stipulation that designators in lexical and grammatical annotations can specify no more than two grammatical functions (Kaplan and Bresnan 1982, page 278).
Definition 7
An annotated reentrancy is an equation (or a symmetric variant) of one of the following forms:
(↑ fg) = (↑ h)
(↑ f) = (↑ h)
(↓ g) = (↑ h)
(↓ f) = (↓ h)
(↓ g) = ↑
↑ = (↑ h)
(↓ g) = ↓
Long reentrancies (with a two-attribute designator) may appear on terminal and nonterminal categories and are typically used for functional control (e.g., (↑ xcomp subj) = (↑ subj)).3 All other reentrancies contain at most one attribute on each side. Reentrancies of the form (↑ f) = (↑ h) might be used for clause-internal topicalization (e.g., (↑ topic) = (↑ obj)) and those of the form (↓ g) = (↑ h) for subject control of open adjuncts (xadjuncts) ((↓ subj) = (↑ subj)). The remaining reentrancies just cover other possibilities for equating the ↑/↓ metavariables with a one-attribute designator. The last two introduce immediate cycles and are not commonly attested in natural language grammars, but they introduce no additional expressive power or formal complexity since their effect can be achieved by combinations of other reentrancies on the list. We also admit atomic-valued annotations of the form (↑ σ) = v or (↓ σ) = v, with |σ| > 0, and we allow each nonterminal to be annotated either by ↑ = ↓ or exactly one function-assigning annotation of the form (↑ f) = ↓ or none of them. The precise definition of the class of LFG grammars with linguistically motivated rule annotations is given below.
Definition 8
A basic LFG G = (N, T, S, R) is suitably annotated if every annotated right-hand side rule category (X, D) satisfies the following conditions:
- (a)
if X ∈ N, then D may contain at most one annotation of the form (↑ f) = ↓ or ↑ = ↓, any number of annotated reentrancies, and any number of atomic-value annotations of the form (↑ σ) = v or (↓ σ) = v, with |σ| > 0,
- (b)
if X ∈ T or X = ϵ, then D may include annotated reentrancies not containing ↓ and atomic-value annotations of the form (↑ σ) = v, with |σ| > 0.
3.2 Undecidable Problems for Suitably Annotated LFGs
As previously noted, the emptiness, parsing, and generation problems are undecidable for unrestricted basic LFG grammars. Without further limitations these problems are also undecidable for suitably annotated LFGs. Wedekind (1999) and Wedekind (2014) provided simple proofs of these results for a slightly less restricted formalism using a reduction from the emptiness problem for the intersection of context-free languages. We adapt this proof strategy to the suitable notation in order to highlight the sources of computational complexity that reentrancies introduce and that must be controlled to ensure LCFRS equivalence.
Following Wedekind (1999) we demonstrate the undecidability of the emptiness problem by constructing for each context-free grammar G = (N, T, S, R) in Chomsky normal form a suitably annotated LFG grammar String(G) such that L(String(G)) = L(G) and the f-structure for each derivable string s contains a simple encoding of s and a representation of a derivation in G that leads to that terminal string. For each terminal rule A → a in R the rule set of the string grammar contains a rule of the form (4a), and for each nonterminal rule A → CD it contains a rule of the form (4b).
The attributes b, e, m, f, and t are mnemonic for “begin”, “end”, “middle”, “first”, and “tail”, respectively. Derivations are threaded through the b(egin), m(iddle), and e(nd) attributes and the f(irst) and t(ail) attributes encode linked-list representations of terminal strings. As illustrated by the c-structure/f-structure pair in Figure 2, an atomic f value corresponds to a terminal at a given string position and the t value sequence represents the suffix of the string from that position. The yield of each context-free constituent is encoded by the b(egin) and e(nd) attributes of the corresponding f-structure, and the path between the rootb and e values represents the entire string. The m(iddle) attribute in (4b) is an auxiliary used to link the daughter yields of binary constituents.
Let G1 = (N1, T1, S1, R1) and G2 = (N2, T2, S2, R2) be two arbitrary context-free grammars in Chomsky normal form with N1 ∩ N2 = ∅. Construct an LFG grammar G = (N, T, S, R) with N = N1 ∪ N2 ∪ {S}, S ∉ N1 ∪ N2, and T = T1 ∪ T2. R consists of the rules of String(G1) and String(G2) and the following start rule:
Then for any valid derivation of a terminal string s in G, s has the form s1s2, with s1 derived from S1 and s2 derived from S2, and s1 = s2, because both string encodings are unified by the S rule and the instantiations of (↑ ef) = # ensure that one string is not a proper prefix of the other. Thus L(G) = ∅ iff L(G1) ∩ L(G2) = ∅.
The undecidability of the parsing/recognition problem follows from a simple modification to the string-grammar specification and, again, the emptiness problem for context-free intersection. If we construct for each terminal rule A → a two rules of the form
where Aa is a new nonterminal, for each terminal a, then ϵ ∈ L(G) if and only if L(G1) ∩ L(G2) ≠ ∅.4
String grammars defined with rules (4–6) contain short reentrancies on the nonterminals and long reentrancies on the terminals or preterminals but no annotated function assignments or trivial annotations. This demonstrates that complexity can arise from just the interaction of structure sharing constraints on f-structure features without the support of node equalities.
An alternative formulation of String(G) shows, on the other hand, that combining just one kind of reentrancy, the short ones, with function assignments can lead to the same undecidability results. Long reentrancies only appear in rules of the form (4a). They can be eliminated in favor of function assignments by using for each terminal rule A → a two rules of the form (7a, b) (emptiness problem) or (7a, c) (recognition problem)
where Aa is a new nonterminal, for each terminal a.
The terminal strings can also be encoded as descending chains of attributes rather than as first-rest sequences of atomic values.5 A sample c- and f-structure are shown in Figure 3. Because such string grammars do not contain atomic-valued annotations, they provide a powerful proof-theoretic tool for establishing that certain more specific questions about the interaction of function assignments and reentrancies remain undecidable even in the absence of the filtering effect of atomic-valued annotations.
We obtain such string grammars by combining nonterminal rules of the form (4b) with terminal rules of the form (8).
If the revised string grammars are combined with the start rule (9), then any problem whose solution can be made dependent on the derivability of node equalities can be shown to be undecidable for suitably annotated LFGs without atomic-valued annotations.
By construction, the function assignments on X and Y will result in a node equality only if the instantiations of the two (↓ e) designators have the same value. This happens only if S1 and S2 derive the same string s (s ∈ L(G1) ∩ L(G2)). Thus, if we add epsilon rules for X and Y then this construction shows that it is in general undecidable whether there are derivations with co-referring nodes in such grammars.
If we construct for each terminal rule A → a string grammar rules with only short reentrancies as in (10),
then the more elaborated X and Y expansions in (11) can be used to show that, even in absence of clashes, it is undecidable whether there are derivations in which all long reentrancies are equivalent to a combination of function assignments and short reentrancies.
The undecidability of these different configurations of annotations is an important diagnostic in our attempt at arriving at LCFRS equivalence: The emptiness and recognition problems are decidable for LCFRS, so further restrictions must be applied to any configuration where these problems are undecidable. For our further considerations it is therefore important to note that the use of short reentrancies must be restricted if they are combined with function assignments, and that the use of long reentrancies must be severely restricted.6 This is because it is even undecidable whether they reduce to short ones.
3.3 Finite-Copying LFGs
The finite-copying grammars of Seki et al. form a subclass of the suitably annotated LFGs that are known to have a tractable recognition and a decidable emptiness problem. These results are obtained by limiting the notation for rule annotations to include only direct function assignments of the form (↑ f) = ↓ and atomic-value annotations of the form (↑ a) = v. In addition the derivations of a finite-copying grammar must respect a global boundedness restriction. Finite-copying LFGs are effectively defined as follows.
Definition 9
A suitably annotated LFG G = (N, T, S, R) is a finite-copying LFG iff
- (i)
every rule A → (X1, D1)..(Xm, Dm) in R satisfies the following conditions:
- (a)
if Xj ∈ N, then Dj may contain at most one annotation of the form (↑ f) = ↓ and any number of annotations of the form (↑ a) = v,
- (b)
if Xj ∈ T, then Dj may only contain annotations of the form (↑ a) = v,
- (a)
- (ii)
for every valid derivation (c, ρ) of a terminal string from S with f-description FD, the number of nodes of c that relate to the same f-structure element is not greater than k, that is, |{n′ ∣ FD ⊢ n = n′}| ≤ k, for any node n of c.
In this definition, condition (i) defines the format of the rule annotations, indicating that finite-copying LFGs are a quite restricted subclass of suitably annotated LFGs. Specifically, they do not allow trivial annotations or annotations that give rise to f-structure reentrancies, the annotations that are required for linguistic description.
Given these notational restrictions, structure sharing can only be achieved through instantiated function-assigning annotations. This specific type of structure sharing is occasionally referred to as “zipper” unification. That is, if two distinct nodes n and n′ co-refer (have the same f-structure) in a valid derivation (FD ⊢ n = n′), then there must always be a node dominating these nodes and the sequences of function-assigning annotations obtained from the annotations on the paths from to n and n′, respectively, must be identical, that is, form a “zipper”. As a consequence, co-referring nodes must have the same c-structure depth.
The bounding condition (ii) limits the number of non-local dependencies or co-referring nodes that can arise through structure sharing and thus excludes c-structure recursions that give rise to zippers of size greater than any constant k. Indeed, Seki et al. have shown that the recognition problem is NP-complete for nondeterministic copying LFGs, which are similar to finite-copying LFGs except that they are not required to satisfy the bounding condition. Thus the bounding condition is crucial for tractable performance even with the severe notational restrictions of the finite-copying formalism. Consider for example the (suitably annotated) LFG grammar with the rules in (12).
The unbounded recursion of this grammar derives the language {a2n ∣ n ≥ 0} with zippers that also grow exponentially. This language is not of constant growth and hence is not in the class of mildly context-sensitive languages (Joshi 1985). Seki et al. observe, however, that it is decidable whether a grammar notationally restricted according to condition (i) also satisfies the bounding condition (ii) and therefore has tractable computational properties.
4. k-Bounded LFG Grammars
In this section, we define a set of restrictions on LFG grammars with linguistically motivated annotations that are compatible with linguistic description and together ensure LCFRS equivalence. These restrictions thus preserve the key advantages of Seki et al.’s minimally expressive formalism.
Because finite boundedness is a necessary condition for LCFRS equivalence, clearly there must be a finite bound on the number of nodes in a functional domain.7 These nodes are annotated with ↑ = ↓ annotations and typically carry information about heads, coheads, and the local morphosyntactic features of a functional unit. The ↑ = ↓ annotations map all the nodes in a functional domain to the same f-structure, and thus allow information to propagate up, down, and across the chain of nodes that relate to a single head.
Definition 10
An LFG grammar G has (height-)bounded functional domains if for any sequence of rules A1 → ϕ1(A2, D2)ψ1A2 → ϕ2(A3, D3)ψ2..An → ϕn(A1, D1)ψn of length n ≥ 1, with Ai ≠ Aj, for 1 ≤ i < j ≤ n, ↑ = ↓ ∈ Di does not hold for all i = 1, .., n.
In the following, we refer to LFG grammars with linguistically suitable annotations and height-bounded functional domains as suitable LFGs.
Definition 11
A basic LFG G is called suitable if G is suitably annotated and G’s functional domains are height-bounded.8
Whether a basic LFG G is suitable is easily decidable by inspecting G’s rules and rule sequences of length less than or equal to |N|.
Because of the height bound, there is a simple transformation of a grammar G into a strongly equivalent LFG grammar G∖↑=↓ that no longer contains ↑ = ↓ annotations. The transformation is accomplished by recursively replacing a category annotated with ↑ = ↓ in the right side of one rule by the right sides of all the rules expanding that category, and making the appropriate replacements of ↑ for ↓ to preserve the f-structure mappings. The detailed construction and the equivalence proof are provided in Appendix A.
Lemma 1
For any suitable LFG G we can construct a suitable LFG without ↑ = ↓ annotations, denoted by G∖↑=↓, such that ΔG = ΔG∖↑=↓.
For the suitable LFG grammar in (1), for example, we then obtain a ↑ = ↓-free grammar whose useful rules (that is, the ones whose left-hand category is reachable from the start symbol) are given in (13).9
It is straightforward to see that for suitable LFGs any bound k, k ≥ 1, on derivations of the sort that Seki et al. used in the definition of finite-copying grammars must also be undecidable. Note first that the emptiness problem is already undecidable for 1-bounded suitable LFGs as the grammars with rules of the form (4, 5) are trivial-free and 1-bounded. With two suitable LFGs Gub and Gb that are known to be unbounded and k-bounded, respectively, we can easily reduce the emptiness problem to the k- or finite-boundedness problem. Given an arbitrary 1-bounded suitable LFG G, we construct a suitable grammar G′ with a rule set consisting of the rules of Gub and Gb, and the two start rules S′ → S Sub and S′ → Sb (with N, Nub, Nb and {S′} pairwise disjoint). Then by construction, G′ is k-bounded iff L(G) = ∅.
We therefore rely on the observation that every LFG grammar G can be decomposed into two subgrammars, a reentrancy-free kernel and an atom-free kernel, both with a decidable emptiness problem. The reentrancy-free kernel of G is formed by removing all reentrancies from its rules, leaving just function assignments and atomic-value annotations. The atom-free kernel is formed by removing all atomic-value annotations from its rules, so that only function assignments and reentrancies remain. The decomposition then allows us to define some necessary and sufficient restrictions on the LFG grammars G by carefully regulating the interplay of the functional descriptions of these two grammars. Formally, we state all further restrictions on the reentrancy-free and atom-free kernels and their interaction in terms of the descriptions FD∖𝓡 and FD∖𝒜 obtained from FD by removing all instantiated reentrancies and all instantiated atomic-valued annotations, respectively.
Because k-boundedness is a necessary condition for LCFRS equivalence we first require the reentrancy-free kernel of G∖↑=↓ to be k-bounded. We show in Section 4.1 that k-boundedness is decidable for the reentrancy-free kernel of any suitable LFG.
Definition 12
A suitable LFG G has a k-bounded reentrancy-free kernel iff for every derivation (c, ρ) of a terminal string from S with f-description FD and clash-free FD∖𝓡 in G∖↑=↓, |{n′ ∣ FD∖𝓡 ⊢ n = n′}| ≤ k, for any node n of c.
We now identify a minimally intrusive restriction that ensures that G∖↑=↓ conserves the ϕ projection of its reentrancy-free kernel, so that in particular G∖↑=↓ is k-bounded if its reentrancy-free kernel is k-bounded. This restriction relates to another notion in the LFG literature, the concept of “nonconstructivity” that has been discussed in the context of functional uncertainty and off-path constraints (Zaenen and Kaplan 1995, Section 3; Dalrymple et al. 1995a, page 133; Crouch et al. 2008, chapter on grammatical notations). We provide a technical formulation of this notion and propose it as a general condition on the operation of reentrancy annotations, a condition that is necessary to ensure LCFRS equivalence. Nonconstructive reentrancies are defined in terms of the interplay between the reentrancy-free and the atom-free kernel in the following way:
Definition 13
A suitable LFG with k-bounded reentrancy-free kernel has nonconstructive reentrancies iff for every derivation (c, ρ) of a terminal string from S with f-description FD and clash-free FD∖𝓡 in G∖↑=↓
if FD∖𝒜 ⊢ n = n′, with n distinct from n′, then FD∖𝓡 ⊢ n = n′.
This restriction ensures that reentrancies in G∖↑=↓ cannot interact with the function-assigning annotations of the reentrancy-free kernel to produce node equalities beyond those that follow just from that kernel’s more limited set of annotations. We show in Section 4.2 that nonconstructivity is decidable for suitable LFGs with k-bounded reentrancy-free kernel and only short reentrancies. Moreover, suitable LFGs with k-bounded reentrancy-free kernel and nonconstructive short reentrancies are equivalent to LCFRSs (see Section 5.2) and thus have a decidable emptiness problem. This, however, does not generalize to suitable grammars with k-bounded reentrancy-free kernel and long reentrancies. To see this, consider the grammars that we used in Section 3.2 to show that it is in general undecidable whether there are derivations with co-referring nodes. These grammars are trivial-free grammars with 1-bounded reentrancy-free kernel, but without atomic-valued annotations. Thus FD = FD∖𝒜, for any derivation. Moreover, because of the 1-boundedness of the reentrancy-free kernel, FD∖𝓡 ⊢ n = n′ does not hold for any pair of distinct nodes n and n′ in any derivation. Thus the nonconstructivity condition of Definition 13 is undecidable because FD∖𝒜 ⊢ n = n′ is undecidable.10
We observe, however, that long reentrancies of the form (↑ fg) = (↑ h), which are typically used to specify the structure sharing relationships in grammatical control constructions (e.g., (↑ xcompsubj) = (↑ obj)), can always be shortened in derivations that meet the requirements of the Coherence Condition.11 This is, specifically, because the controllee (subj) of an instantiated control equation (nxcompsubj) = (nobj) is a governable function in an open (xcomp) complement and therefore must be licensed by the complement’s semantic form. These licensing semantic forms are always introduced by simple pred equations associated with individual lexical entries, for example (↑ pred) = 'walk〈subj〉'. Thus, (↑ pred) = 'walk〈subj〉' must instantiate to the equation (n′ pred) = 'walk〈subj〉' at some node n′, and the functional description must also entail an equation (nxcomp) = n′ that links the complement to the higher clause. This (derived) function assignment justifies the reduction of the long reentrancy to the equivalent shorter one (n′ subj) = (nobj).
Certainly, we cannot anticipate this effect of the coherence condition just by removing from consideration all derivations in which long reentrancies cannot be reduced to shorter ones: Such an additional filter on valid derivations would preserve the undecidability of the emptiness problem. And this would, as demonstrated in Section 3.2, even apply to the weaker filter where the shortening equations are required to follow from the atom-free subset of FD. Hence those grammars cannot be equivalent to LCFRSs (because LCFRSs have a decidable emptiness problem). Therefore, we require the derivations to meet the stronger stipulation that the shortening equations (nxcomp) = n′ follow from the f-description of the reentrancy-free kernel, which excludes the use of other reentrancies in those supporting inferences. This is made explicit in the following extension to the notion of derivation.
Definition 14
A pair (c, ρ) consisting of a labeled tree c and a mapping ρ from the nonterminal nodes of c into R is a derivation of string s ∈ (N ∪ T)* from Bwith functional descriptionFD in G iff c and ρ satisfy the conditions (i–iv) of Definition 3 and
(v) if (nfg) = (nh) ∈ FD, then FD∖𝓡 ⊢ (nf) = n′ for some node n′.
Certainly, this stronger condition excludes some analyses that otherwise appear to lie within scope of the normal derivational machinery of the LFG formalism. In all likelihood the derivations that this restriction eliminates would also fail to meet the coherence condition and thus no linguistically significant derivations will be lost.
LFG grammars that meet all the restrictions we have set out are called k-bounded LFGs.
Definition 15
A basic LFG G is k-bounded iff
- (i)
G is suitably annotated,
- (ii)
G has height-bounded functional domains,
- (iii)
G’s reentrancy-free kernel is k-bounded,
- (iv)
G has nonconstructive reentrancies.
For k-bounded LFGs the recognition and emptiness problems are decidable because, as we show in Section 5.2, k-bounded LFGs are weakly equivalent to k-LCFRSs. Moreover, because of the nonconstructivity of the reentrancies and condition (v) of Definition 14, G∖↑=↓ cannot pcyond those of its reentrancy-free kernel.
Corollary 1
Let G be a k-bounded LFG grammar. Then for every derivation (c, ρ) of a terminal string from S with clash-free f-description FD in G∖↑=↓
if FD ⊢ n = n′, with n distinct from n′, then FD∖𝓡 ⊢ n = n′.
Hence, the k-boundedness of G’s reentrancy-free kernel also extends to G∖↑=↓, although, because of the filtering effect of the atomic-valued annotations, the bound of G∖↑=↓ can be smaller than k.12
Corollary 2
Let G be a k-bounded LFG grammar. Then for every derivation (c, ρ) of a terminal string from S with clash-free f-description FD in G∖↑=↓, |{n′ ∣ FD ⊢ n = n′}| ≤ k, for any node n of c.
4.1 The k-Boundedness of the Reentrancy-Free Kernel
As for Seki et al.’s finite-copying grammars, structure sharing in reentrancy-free kernels of suitable LFGs can only be achieved through instantiated function-assigning annotations. Co-referring nodes therefore belong to identical (zipper) paths and lie at the same depth below a common ancestor node.
We exploit a shrinking argument to show that the k-boundedness of G’s reentrancy-free kernel can be determined by inspecting a sufficiently large but finite number of “small” derivations in G∖↑=↓. Shrinking of a derivation is accomplished by removing certain parts of that derivation, yielding a smaller derivation. To identify the conditions that allow shrinking of G∖↑=↓’s derivations, let (c, ρ) be a derivation of a terminal string from S with f-description FD and clash-free FD∖𝓡 in G∖↑=↓. Consider the sets of nodes that co-refer in FD∖𝓡, that is, the sets [n] = {n′ ∣ FD∖𝓡 ⊢ n = n′} where n is a nonterminal node. Set [n] = {n} if n does not occur in FD.
As an illustration, consider in Figure 4 the annotated c-structure of the terminal string nnnvvv in the ↑ = ↓-free grammar with the rules in (13). This more conventional way of depicting c-structures and their licensing rule mappings (Kaplan and Bresnan 1982) makes it easy to read the f-description from the licensed c-structure. The description FD∖𝓡 obtained from the f-description FD of this derivation is given in (14).
The equivalence classes that result from FD∖𝓡 are {root}, {n1}, {n2}, {n3, n4}, {n5}, and {n6}.
Next, we exhibit the atomic-valued information that is inherited to the nodes in [n] from the nodes in higher-level equivalence classes. For this purpose, we replace each occurrence of a node n of c in FD by [n] and denote this description by eFD. From the description in (14) we thus obtain the description eFD∖𝓡 in (15).
Here and in our further considerations we use the following closure construction to derive the inherited atomic-valued equations.
Definition 16
Let FD be a description obtained from linguistically suitable annotations by instantiating the metavariables with specific elements, such as classes, as denoted by a and b. Then the closure of FD is defined as being the smallest set that includes FD and is closed under the rule: if (af σ) = v ∈ and (af) = (b σ′) ∈ FD, then (b σ′ σ) = v ∈ .
For the description provided by the reentrancy-free kernel it is always the case that σ′ is empty. If eFD∖𝓡 contains an atom-value equation ([n] f σ) = v and a function assignment ([n] f) = [n′], the closure of eFD∖𝓡 will contain the shortened value assignment ([n] σ) = v. In our simple example we can apply ({root} s) = {n1} to ({root} sa) = c and ({n3, n4} x) = {n6} to ({n3, n4} xx) = #, but in these cases the resulting equations ({n1} a) = c, ({n6} x) = # are already contained in eFD∖𝓡.
Shrinking depends on comparing the atomic-valued information that the closure associates with the node equivalence classes to find matching classes at different levels of a derivation. To that end, we abstract away from particular class instantiations by defining, for any description FD, a function hFD that assigns to an instantiating element a characterization of all and only the atomic-valued equations that a description associates with that element. This function is defined in (16).
Note that by construction of the length of σ is bounded by the maximum length of the attribute sequences occurring in the atomic-valued annotations of G. This length is in the following denoted by ℓ. The values of h are members of the set 𝓔 of all possible clash-free atomic-valued specifications.
Definition 17
We can shrink a derivation if there are node classes [nl] and [ň] where at least one node in [nl] dominates a node in [ň], h assigns to [nl] and [ň] the same sets of atomic-valued schemata, and there is a mapping g from [nl] to [ň] such that ni and g(ni) are licensed by the same rule (ρni = ρg(ni)), for each ni in [nl].
Shrinking is accomplished by replacing the subderivations under nodes in the equivalence class [nl] by the subderivations under nodes determined by the g mapping. Specifically, we define the replacement ([nl]) as the derivation produced from (c, ρ) by removing the subtree and rule mapping under each node ni in [nl] and then inserting under ni a copy of the subtree and rule mapping under g(ni). The shrinking process is illustrated in Figure 5 and formally defined as follows.
Definition 18
- (i)
at least one node in [nl] dominates a node in [ň],
- (ii)
h([nl]) = h([ň]),
- (iii)
there is a function g from [nl] to [ň] such that ρni = ρg(ni) for each ni in [nl],
Because sh(c, ρ)([nl], [ň]) is obviously a derivation in G∖↑=↓, it remains to be shown that the f-description that G’s reentrancy-free kernel provides for this derivation is clash-free. Let FDt be the f-description of the (partial) derivation obtained from (c, ρ) by removing the subtree and rule mapping under each node ni in [nl], and FDb be the union of the f-descriptions obtained from the copies of the subderivations rooted at g(ni). Then, by construction, sh(c, ρ)([nl], [ň]) gets assigned the f-description FD′ = FDt ∪ FDb. Certainly, and ∪ {ni = nk ∣ ni, nk ∈ [nl]} are both clash-free. Because node equalities can only be established through function assignments, ⊢ n = n′ if FD∖𝓡 ⊢ n = n′, for all n, n′ occurring in FDt. Thus {n′ ∣ ⊢ nl = n′} is equal to [nl], and, for the h mapping h′ of sh(c, ρ)([nl], [ň]), h′([nl]) = h([nl]) by condition (iii). Hence must be clash-free because of condition (ii).
Lemma 2
Let (c, ρ) be a derivation in a suitable LFG grammar G∖↑=↓ with clash-free FD∖𝓡 and [nl] and [ň] be a pair of classes satisfying (i–iii) of Definition 18. If FD′ is the f-description of sh(c, ρ)([nl], [ň]) (respectively ([nl], [ň])), then is clash-free.
For the following notice that sh1−1 preserves the multiplicities of the licensing rules and clash-freeness whereas sh only preserves clash-freeness, but both operations abstract from the left-to-right c-structure order. To see this, note first that we can assign to each [n] of a derivation a multiset (P, m) whose support P consists of the rules licensing the nodes in [n] (P = { ∣ ∈ [n]}) and whose multiplicity function m specifies for each rule r in P the number of nodes in [n] that are licensed by r (m(r) = |{ ∈ [n] ∣ = r}| for all r ∈ P). Let fm be the function that assigns to each [n] both the multiset assigned to [n] and its h value, that is, fm([n]) = ((P, m), h([n])). Because there is a one-to-one correspondence g between [nl] and [ň] with ρni = ρg(ni) for each ni in [nl], if and only if the multisets assigned to [nl] and [ň] are identical, we can apply sh1−1 to a derivation if there are nodes nl and ň on a path with fm([nl]) = fm([ň]).
With respect to sh, observe first that by definition, sh(c, ρ) can be applied to a pair ([nl], [ň]), if the two classes agree in their h values (h([nl]) = h([ň])) and there is a mapping g from [nl] to [ň] such that ρni = ρg(ni), for each ni in [nl]. Because there is such a g mapping from [nl] to [ň] if and only if each rule that licenses a node in [nl] also licenses a node in [ň] ({ρni ∣ ni ∈ [nl]} ⊆ {ρň′ ∣ ň′ ∈ [ň]}), we can shrink if h([nl]) = h([ň]) and the support of the multiset assigned to [nl] is included in the support of the multiset assigned to [ň].
Now we demonstrate that it can be decided whether the reentrancy-free kernel of a suitable LFG G is k-bounded by inspecting derivations in G∖↑=↓ whose (c-structure) depth does not exceed some fixed upper bound.
Lemma 3
For any suitable LFG G and any constant k, it is decidable whether G’s reentrancy-free kernel is k-bounded.13
Proof
Suppose there is a derivation (c, ρ of a terminal string in G∖↑=↓ that violates the k-boundedness condition. Let n′ be one of the topmost nodes of c with |[n′]| > k and consider the equivalence classes that contain the nodes of the path from root to the mother of n′. Observe first that for any of these equivalence classes [nl] and [ň] that satisfy the conditions for applying , ([nl], [ň]) must still violate the k-boundedness condition because sh1−1 preserves the multiplicities of the licensing rules and therefore all nodes in [n′] are still dominated by nodes in [ň]. This situation is illustrated in Figure 6.
To determine the upper bound on the path length from root to n′, recall that we can shrink the derivation if there are nodes nl and ň on the path from root to the mother of n′ with fm([nl]) = fm([ň]). Because the set of all multisets over R with cardinality less than or equal to k, in the following denoted by 𝓜≤k(R), is finite and the number of h values is bounded by |𝓔|, a path from root to n′ longer than |𝓜≤k(R) × 𝓔| must contain equivalence classes that satisfy the conditions for shrinking. Thus, if the grammar violates the k-boundedness condition then there must be a derivation with a class [n′], with |[n′]| > k, and the depth of n′ is less than or equal to |𝓜≤k(R) × 𝓔| (recall here that the depth of root is 0).
Now we determine an upper bound on the path length through the remaining equivalence classes of the derivation. If we can apply sh(c, ρ) to a pair ([nl], [ň]) of these classes (with a node in [nl] properly dominating a node in [ň]), then, obviously, the violating class [n′] remains in sh(c, ρ)([nl], [ň]). Recall that we can apply sh(c, ρ) to ([nl], [ň]) if h([nl]) = h([ň]) and if the support of the rule multiset assigned to [nl] is included in the support of the one assigned to [ň]. Because the supports are nonempty subsets of R, shrinking with sh(c, ρ) is thus possible if the path length between two nodes is greater or equal to |Pow+(R) × 𝓔|, where Pow+(R) denotes the set of all nonempty subsets of R.
Thus if G∖↑=↓ violates the k-boundedness condition then there must be a derivation (c, ρ) with a c-structure depth less than or equal to |𝓜≤k(R) × 𝓔| + |Pow+(R) × 𝓔| + 1 that violates k-boundedness. Hence, for any suitable LFG G it can be determined whether G’s reentrancy-free kernel is k-bounded.
4.2 Nonconstructive Reentrancies
In this section, we show that the nonconstructivity condition of Definition 13 is decidable for suitable LFGs with k-bounded reentrancy-free kernel. For the proof we use a shrinking argument similar to that used for proving the decidability of the k-boundedness of the reentrancy-free kernel. The shrinking argument crucially depends on the fact that all long reentrancies can be shortened in a valid derivation, because, for descriptions including only instantiated short reentrancies and function assignments, nonconstructivity violations can be established through substitution proofs of a certain structure.
The decidability of the nonconstructivity condition then results from the fact that there are always canonical proofs of the form (19) of equalities that violate this condition if the descriptions contain only function assignments and short reentrancies.
Notice for the following that we can always shorten a proof of the form (18) (or (19)) if the right-hand terms of two derived equations are identical.
Lemma 4
Let FD be a description obtained from short reentrancies and function-assigning annotations by instantiating the metavariables with classes (or other elements) denoted by a, a1, a2, .., b, b1, ... If the equality of two distinct classes follows from FD, then there is at least one pair of distinct classes a and a′ whose equality is established by a proof of the form (19).
Proof
Suppose that FD ⊢ b = b′, for two distinct elements b and b′. Then there is a substitution proof of the form
where b is rewritten to b′. (This proof structure coincides with the one given in (18).) By assumption, the premises ei are either class-valued (of the form (bf) = c) or complex-valued ((bf) = (cg)). Without loss of generality, we can assume that the right-hand terms are pairwise distinct (tj ≠ ti, for m + 1 ≥ j > i ≥ 0). (If tj = ti, j > i, we obtain a shorter proof of b = b′ by deleting the inference of b = ti+1 from b = tj.) Suppose that the proof is not already (up to biunique renaming) of the form (19). Then there are two cases to consider.
(a) There are one or more terms ti, 1 ≤ i ≤ m, whose attribute sequences are longer than one. Let tk = (bkfk σ), σ ≠ ϵ be one of the longest terms (i.e., there is no other term that is longer than tk). Then step k must lie between a previous step l that lengthens the attribute sequence and a following step j that shrinks it, with el+1 = bl+1 = (blfl), tl = (blfl σ), tj = (bjfj σ) and ej = (bjfj) = bj−1. Obviously, bl+1 and bj−1 must be distinct because otherwise tl+1 and tj−1 would not be distinct, contradicting our assumption. But then there is a proof
that has up to biunique renaming the form (19).
(b) All terms ti are one-attribute terms or node classes. Because the proof is not of the form (19) there must be one or more tj, m > j > 1, that are node classes. Let tl be the first inferred term that is a node class and suppose that tl is bl. Then, by our assumption, b and bl are distinct. Hence the inference of b = bl (= tl) from b = b must have (up to biunique renaming) the form (19).
For the decidability proof we use the description sFD, which is similar to eFD except that the long reentrancies are shortened. Let †FD be the f-description obtained from FD by shortening all long reentrancies. Obviously, this f-description is logically equivalent to FD. Then sFD is the description that we obtain from †FD by replacing (similar to eFD) each occurrence of a node n of c in †FD by [n]. This substitution is safe because for any proof of an equality t = t′, involving at most n and n′, in †FD there is a corresponding proof of (t = t′)[n/[n], n′/[n′]] in sFD, and vice versa.15 The following lemma that establishes the decidability of the k-boundedness entails the decidability of the nonconstructivity.
Lemma 5
For any basic LFG grammar it is decidable whether it is k-bounded.
Proof
Let G be an arbitrary basic LFG grammar. We know already that it is decidable whether G is suitable and G’s reentrancy-free kernel is k-bounded. Thus assume that G satisfies these conditions. We show that it can be decided whether G∖↑=↓ satisfies the nonconstructivity condition of Definition 13 by inspecting derivations of (c-structure) depth less than or equal to 3|𝓜≤k(R) × 𝓔| + 1.
Suppose that G∖↑=↓ does not satisfy the condition of Definition 13. Then there is a derivation (c, ρ) of a terminal string s from S with f-description FD and clash-free FD∖𝓡 in G∖↑=↓, and [n] = [n′], with [n] distinct from [n′], can be derived from sFD∖𝒜 by a proof of the form (19). Now assume that the depth of the derivation exceeds this upper bound and we could not shrink (c, ρ) without invalidating the provability of [n] = [n′]. Then there must be at least four (distinct) nonterminal nodes , .., on a path of length greater than 3|𝓜≤k(R) × 𝓔|, with dominating and fm([]) = fm([]) (j = 1, .., 3), and for any pair [], [], the shrink operation would break up the proof of [n] = [n′] by eliminating necessary right-hand side premises. Now recall that two distinct nodes can be related in a single equation only if the nodes stand in a mother–daughter relationship and that, by definition of , the nodes of [], .., [] are licensed by the same rules and can therefore provide the same premises. Thus, the right-hand side premises must originate from equivalence classes from a subpath ..′ of the path from to with properly dominating and properly dominating ′. (If they would originate only from equivalence classes from the path from to , then we could apply to [], [] and [], [] and [n] = [n′] would still be provable.) From the properties recalled above, it also follows that we would obtain a shorter canonical proof of [n] = [n′] if no class-valued premise of the proof (the first and last premise in (19)) is eliminated. Because there are three distinct pairs but only two class-valued premises, for at least one pair the provability of [n] = [n′] must be preserved under , in contradiction to our assumption.16
If we apply the closure to sFD, instead of eFD∖𝓡, then atomic-valued information is not only inherited downward (as in the case of eFD∖𝓡), but reentrancies can cause it to propagate upward, shrink, or distribute across equivalence classes. An illustration of the propagation process is given in Figure 7. However, the term length of the atomic-valued equations in can again not exceed ℓ, because all long reentrancies have been shortened (|σ′| ≤ 1). The following lemma shows that clash-freeness of any f-description FD can be determined just by considering for each node class all atomic- and class-valued equations of that pertain to that class.
Lemma 6
Let FD be a description obtained from short reentrancies and function-assigning and atomic-valued annotations by instantiating the metavariables with classes denoted by a, a1, a2, .., b, b1, … If FD ⊢ a = b does not hold for two distinct classes a and b, then FD is clash-free iff does not contain equations of the form (a σ) = v and either (a σ) = v′ (v ≠ v′), or (a σ) = b, or (a σ σ′) = v′ with σ and σ′ both not equal to ϵ, for any class a occurring in FD.
Proof
Let FD be a description satisfying the conditions of the lemma. If FD is clash-free then clearly satisfies the claim of the lemma. Now suppose a clash would follow from FD. We know that there is a proof of the form (18), repeated here for clarity of exposition,
where (a) tm+1 and t0 are either two distinct atomic values v and v′, or (b) one is a class b and the other is an atomic value v, or (c) t0 is a term of the form (v σ), σ ≠ ϵ, and tm+1 is a term or subterm occurring in FD. From arguments used in the proof of Lemma 4, we can assume that
- (i)
the right-hand side terms are pairwise distinct,
- (ii)
none of the terms ti, i = 1, .., m, is a class or atomic value and the premises ej, j = 2, .., m, are included in FD∖𝒜, because, under assumption (i), we would otherwise obtain a shorter proof of a clash or a violation of the condition of the lemma (thus the term length decreases or increases by at most one attribute at each rewriting step from tm to t1),
- (iii)
the length of the terms tm, .., t1 does not increase and then decrease.
Given these properties, the proof of the lemma can be completed as follows.
In case (a), tm+1 and em+1 must have the form v′ and v′ = (am σm), and t1 and e1 the form (a1 σ1) and (a1 σ1) = v. Let tj be one of the shortest terms (this exists because of (iii)) and suppose it has the form (a σ). Then (a σ) = v and (a σ) = v′ must be in because of (ii) and (iii).
Case (b) is similar to case (a). However, because tm (or t1) is a one-attribute term and thus one of the shortest terms, we can here simply assume j = m (or j = 1).
In case (c), t0 has the form (v σ), with σ ≠ ϵ. Thus t1 and e1 must have the form (a1 ζ σ) and (a1 ζ) = v. Let tj be one of the shortest terms that contains σ, i.e., tj = (a σ′ σ). Without loss of generality, we can assume σ′ ≠ ϵ, because otherwise ej, .., e2 must rewrite a to (a1 ζ) and there must be a shorter proof of a = v. Hence, by (ii), the terms tj+1, .., tm+1 cannot be shorter than tj. Because tj contains at least two attributes, tm+1 must be a (sub)term occurring in an atomic-valued equation v′ = (tm+1 ξ). Thus, (a σ′ σ ξ) = v′ and (a σ′) = v must be in because of (ii) and (iii).
The weak equivalence of k-bounded LFGs and k-LCFRSs (that we prove in Section 5.2) crucially depends on the ability to decide clash-freeness of every derivable f-description FD in a k-bounded LFG G∖↑=↓ through local clash tests on the closure of sFD as stated in Lemma 6. This is what enables us to simulate all the conditions for correct LFG derivation with a finite set of LCFRS predicate symbols and a finite set of LCFRS productions, and thus to rely on the finite control of the rule-by-rule predicate matching process of LCFRS derivation to produce the strings in L(G).
5. k-Bounded LFGs and k-LCFRSs
In this section we recall the formal definition of linear context-free rewriting systems. Then we establish the (weak) equivalence of k-bounded LFGs and k-LCFRSs by proving that for each k-bounded LFG there is a weakly equivalent k-LCFRS. The proof is constructive in that it provides a procedure for constructing for any k-bounded LFG G a weakly equivalent k-LCFRS G′. The other direction of the equivalence follows trivially from Seki et al.’s (1993) result for finite-copying LFGs (which are properly included in the class of k-bounded LFG grammars). Thereafter, we refine the LCFRS grammar-construction algorithm so that the f-structure that is assigned to a derived string s in a k-bounded LFG G can be read out (in linear time) from the derivation of s in the corresponding k-LCFRS G′. Finally, we demonstrate that LCFRS equivalence extends to k-bounded LFG grammars that make use of the additional descriptive devices proposed by Kaplan and Bresnan (1982) and still in common use.
For the constructions in this section, we assume that the k-bounded LFG grammars have only unannotated terminals and epsilons. This is without loss of generality, because k-bounded LFGs can easily be transformed into equivalent k-bounded grammars in this more restricted format. The transformation is done by replacing each annotated terminal symbol (a, D) in the right-hand side of a rule by (Aa, D) and adding a rule Aa → a, where Aa is a new unique nonterminal, for each terminal a. Annotated epsilons are eliminated by replacing each rule of the form A → (ϵ, D) by two rules of the form A → (Aϵ, D), Aϵ → ϵ, where Aϵ is a new dummy preterminal. Obviously, the resulting grammar derives the same string/f-structure mapping as the original.
For the ↑ = ↓-free 2-bounded LFG grammar with the rules given in (13) we then obtain a grammar with the rules in (20).
5.1 Linear Context-Free Rewriting Systems (LCFRSs)
We now introduce linear context-free rewriting systems and their languages (see Kallmeyer [2013] for a survey on LCFRSs). For the definition we assume that 𝒱 is a set of variables.
Definition 19
An instantiation of an LCFRS rule r is obtained by replacing all variables in r with terminal strings. This is formally defined as follows.
Definition 20
Let r = A(α1, .., α𝔞(A)) → A1(, .., )..Am(, .., ). Then r′ is an instantiation of r if there is an η = {(, ) ∣ ∈ T* and 1 ≤ j ≤ m and 1 ≤ i ≤ 𝔞(Aj)} and r′ = r[η].
In the following we abbreviate A(α1, .., α𝔞(A)) (αl ∈ (T ∪ 𝒱)*) by A() if there is no need to refer to the particular arguments. The language of an LCFRS G is defined in terms of the set of instantiated nonterminals that G derives.
Definition 21
Let G be an LCFRS. Then the set of instantiated nonterminals derivable by G is the smallest set LN(G) satisfying the following conditions:
- (i)
if A() → ϵ ∈ R, then A() ∈ LN(G),
- (ii)
if A1(1), .., Am(m) are in LN(G) and A() → A1(1)..Am(m) is an instantiation of a rule in R, then A() ∈ LN(G).
As an illustration, consider the 3-LCFRS G = ({S, A}, {a, b}, S, R), with R as given in (21), which derives the double copy language {www ∣ w ∈ {a, b}+}.
This LCFRS derives, for example, the string abbabbabb because the instantiated non-terminal S(abbabbabb) is derivable as shown in (22).
Derivability in LCFRSs can also be trivially restated in terms of labeled trees and licensing rule mappings, as an analogy to derivations in basic LFGs.
Definition 22
A pair (c, ρ) consisting of a labeled tree c and a mapping ρ from the nodes of c into R is a derivation of a tuple of strings (s1, .., sj), si ∈ T*, 1 ≤ i ≤ j, from B in LCFRS G iff
- (i)
for each terminal node n with label A(), ρn = A() → ϵ,
- (ii)
for each nonterminal node n with label A() and dts(n) = n1..nm with labels A1(1), .., Am(m), A() → A1(1)..Am(m) is an instantiation of ρn,
- (iii)
the label of root is B(s1, .., sj).
A terminal string s is derivable in LCFRS G iff there is a derivation of (s) from S. Because a terminal string s is derivable in G iff S(s) ∈ LN(G) the language of G is equal to {s ∈ T* ∣ s is derivable in G}.
The simple derivation tree for the string abbabbabb in the LCFRS comprising the rules in (21) is depicted in (23a) and its licensing rule mapping is given in (23b).
5.2 Weak Equivalence of k-Bounded LFGs and k-LCFRSs
For a given k-bounded LFG G we construct a k-LCFRS G′ and then show that L(G) = L(G′). The rules of G′ are based on the rules and categories of G∖↑=↓ and are constructed so as to simulate the derivations of that grammar. We have established that node equivalence classes [n] in G∖↑=↓ arise only through zipper unifications and that equivalent nodes are therefore at the same level of the c-structure. We also know that the daughters of equivalent nodes are themselves equivalent if and only if the rules that expand their mothers have the same function assignment annotations, and that those rules must contain function assignments that can be used to shorten any long reentrancies. Our construction makes use of these properties.
We first illustrate the LCFRS rule construction for LFG rules with nonterminal or preterminal daughters.17 Suppose that an equivalence class [n] (of not more than k nodes) arises in the course of a G∖↑=↓ derivation. Because the nodes in [n] are at the same level, they can be ordered according to the linear precedence relation ≺ on their terminal yields. We can thus assign to [n] a sequence of nonterminals Γ that label its ordered nodes (|Γ| = |[n]|), and use that label sequence in building the left-hand predicates of candidate LCFRS rules that might simulate how daughters of nodes with those labels are derived. We hypothesize rule sequences ϱ that could have been used to expand the nonterminals of Γ in a valid G∖↑=↓ derivation (i.e., a sequence ϱ of the form ϱ1 = Γ1 → ψ1, .., ϱ|Γ| = Γ|Γ| → ψ|Γ|). To take an example from the 2-bounded LFG grammar in (20), one such rule sequence for the sequence Γ = is given in (24).
We test such a candidate rule sequence to determine whether its rule annotations give rise to locally clash-free descriptions and whether they include the function assignments needed to shorten all long reentrancies. Rule sequences that do not meet these conditions cannot participate in a valid G∖↑=↓ derivation and can safely be removed from consideration.
The test is conducted on the description FDϱ. This description is constructed by instantiating ↑ in the annotations of ϱl by a constant bl and ↓ in the daughter annotations of ϱl by constants of a sequence βl whose length coincides with that of the right-hand side of ϱl. The b’s and the constants in the concatenation β = β1..β|Γ| must be pairwise distinct so that we make the same discriminations as the nodes of a valid derivation from these mothers. Because the mother nodes are all equivalent by hypothesis we include in FDϱ equations b1 = bi, for all i = 2, .., |Γ|. For this example we can instantiate the mothers with b1 and b2 and the daughter sequences with β1 = = YZ and β2 = = QR. For the rule sequence in (24) and these mother and daughter constants, FDϱ is the description in (25).
This description is clash-free and the long reentrancy (b2xs) = (b2o) can be shortened either through (b2x) = R, or (b1x) = Z and b1 = b2. This rule sequence thus remains as a candidate for further consideration. The constants used for daughter instantiation will also serve as the variables in constructed LCFRS rules (and henceforth we refer to them interchangeably also as variables).
The right-side predicates account for the possibility that FDϱ entails the equality of instantiating variables for separate nonterminal daughters of the rules in ϱ, that is, FDϱ ⊢ βi = βj. The equality of variables Z and R, for example, follows from the instantiated assignments (b1x) = Z and (b2x) = R, and the equality b1 = b2. The equalities between daughter variables give rise to equivalence classes [βi] corresponding to the node equivalence classes of a simulated derivation. We also include classes [βi] = {βi} for variables that do not appear in FDϱ because ↓ does not appear in the annotations of corresponding nonterminal daughter nodes. We impose an order on the variables within each equivalence class according to their order in β (which reflects the c-structure precedence relation ≺ of corresponding daughters). That is, we define a precedence order between the variables of β by βi ≺ βj if i < j.
In our construction, the variables range over the yields of the right-side nonterminals in simulated derivations of G∖↑=↓. The left- and right-side LCFRS predicates are set up so that a derivable string instantiation of the ith argument of a predicate is the yield of the ith nonterminal of its predicate symbol in a G∖↑=↓ derivation.
We first construct one right-side predicate for each of the [βi] equivalence classes. The predicate symbol of the predicate is the concatenation of nonterminal categories corresponding to the ordered variables in [βi] and its arguments are just the ordered variables of the class. This depends on the fact that there is a one-to-one alignment of the variable sequence β and the sequence of categories projected from the concatenated daughters of the ϱ rules. The category projection Cat is defined for every annotated category (X, D) by Cat(X, D) = X and then extended in the natural way to strings of annotated categories. The alignment can be represented as a function θ that assigns to the ith element of β the ith element of ϕ = Cat(ψ1..ψ|Γ|), that is, θ(βi) = ϕi for i = 1, .., |β|. Applied to the concatenation of the right-hand sides of the rules ϱ1 and ϱ2 the Cat projection yields NPAv, and, for β = YZQR, θ is given by θ(Y) = NP, θ(Z) = , θ(Q) = Av, and θ(R) = . For the equivalence classes of our example, {β1} = {Y}, {β2, β4} = {Z, R} (with Z ≺ R), and {β3} = {Q}, we then obtain the predicates θ(Y)(Y) = NP(Y), θ(Z)θ(R)(Z, R) = (Z, R), and θ(Q)(Q) = Av(Q).
In LCFRS rules, the arguments of the left-side predicates describe how their yields are assembled from the yields instantiating the variables of the right-side predicates. We thus construct the lth argument of the left-side predicate Γ by substituting for each daughter in the Cat projection of the right-hand side of rule ϱl the variable that occurs in the corresponding position in βl. This ensures that the LCFRS rule assembles for each Γl the terminal string instantiation of the lth argument in accordance with Cat(ψl). For our example we thus obtain the left-side predicate (YZ, QR).
Because the right-side predicates of LCFRS rules must be arranged in a sequence, we extend the relation ≺ on variables to equivalence classes by ordering classes according to their least elements. Using the extended ≺ to arrange the corresponding predicates, we obtain the rule (26).
We have already determined that FDϱ is locally clash-free. But we have not yet ensured that clashes do not arise from atomic values that other steps of a simulated derivation might associate with an equivalence class of nodes or with their daughter classes. To control for these (and other) potential clashes of the simulated derivations, we refine the left-side predicate symbol Γ and the right-side predicate symbols Γj with sets of atomic-valued equations E, Ej of 𝓔 that could arise in any valid LFG derivation as values for the corresponding equivalence classes. One such refinement of the skeletal rule (26) is shown in (27).
Because these refinements must give rise to clash-free descriptions we require them to be mutually compatible and consistent with the annotations of the hypothesized ϱ rule sequences. This constraint is implemented by checking for clash-freeness the f-description FD′ created by combining sFDϱ with the descriptions formed by appropriately instantiating the selected refinements. Specifically, we substitute [b1] for * in the refinement of the left-side predicate, and we replace * in the refinement of each right-side predicate by the predicate’s associated equivalence class of variables.
The atomic-valued information that the E refinements associate with the rule equivalence classes is also required to be invariant under closure of FD′, that is, the refinement of each equivalence class is assumed to record exactly the information that associates with that class. Under this requirement, the clash-freeness of the FD′ of each rule and Lemma 6 ensure that the finite control of the rule-by-rule predicate matching process of LCFRS derivations will simulate LFG derivations of G that have clash-free f-descriptions.
Finally, for the rules that expand preterminal categories into unannotated terminals Aa → a and epsilons Aϵ → ϵ, we add trivial LCFRS rules of the form (a) → ϵ and (ϵ) → ϵ, and we include a particular set of start rules. The start rules expand the root predicate S′(X) of the new grammar to unary predicates SE(X) consisting of the original start symbol and one clash-free set of atomic-valued equations E in 𝓔.
More precisely, the construction is as follows.
Definition 23
This construction produces for the 2-bounded LFG grammar with the rules in (20) an LCFRS that includes the rules in (28). These rules generate the same language as the original LFG. The construction creates many other categories and rules that are not shown here because they are either useless or redundant. Useless rules cannot participate in the simulation of any LFG derivation while redundant ones simulate only the same derivations as other rules and categories in the grammar.
Weak equivalence then follows by bottom–up induction on the depth of the derivations.
Lemma 7
For any k-bounded LFG G there exists a k-LCFRS G′ with L(G) = L(G′).
Proof
Let G′ be the k-LCFRS constructed for a k-bounded LFG G as in Definition 23.
We first show L(G) ⊆ L(G′). Let (c, ρ) be a derivation of a terminal string s from S with f-description FD and f-structure F in G∖↑=↓. Consider the equivalence classes [n] for the nonterminal nodes n of c. Obviously, |[n]| ≤ k for each n. Assign to each [n] a predicate ΓE(Y1, .., Y|Γ|), with 𝔞(Γ) = |[n]|, |Γ| = |[n]|, and E = ([n]), where Yl is the lth greatest element of [n] and Γl its label. The assignment of predicates to node classes, as well as the inductive step, are illustrated in Figure 8. We show bottom–up for each [n] that ΓE(s1, .., s|Γ|) ∈ LN(G′) where sl is the yield of Yl. Because G∖↑=↓ has no annotated terminals and the annotations of the preterminals do not contain ↓, [n] must be a singleton set and ([n]) = ∅, for any preterminal node n. Then ρY1 must be of the form Aa → a or Aϵ → ϵ, and (a) → ϵ or (ϵ) → ϵ must be in R′ by construction of G′. Hence (a) or (ϵ) is in LN(G′). Now suppose that [n] does not contain only preterminal nodes. Let [n1], .., [nm] be the equivalence classes of the nonterminal daughters of the nodes in [n] ordered according to their least elements and (, .., ), .., (, .., ) be the corresponding predicates. Obviously, the rule r = ΓE(dts(Y1), .., dts(Y|Γ|)) → (, .., )..(, .., ) must be in R′ if we construct LCFRS rules for the sequence ϱ1 = ρY1, .., ϱ|Γ| = ρY|Γ| according to Definition 23 with βl = dts(Yl) and E, E1, .., Em as specified above. We know by inductive hypothesis that (, .., ) ∈ LN(G′) where is the yield of . Define a substitution η by η() = and let ΓE(dts(Y1), .., dts(Y|Γ|))[η] = ΓE(s1, .., s|Γ|). Then ΓE(s1, .., s|Γ|) ∈ LN(G′) and the claim that sl is the yield of Yl holds by construction of r (see the illustration on the right-hand side of Figure 8). Hence, SE(s) is in LN(G′) and S′(s) in LN(G′) by S′(X) → SE(X).
We then establish L(G′) ⊆ L(G). Let (c′, ρ′) be a derivation of (s) from S′ in G′. We show bottom–up for each node n′ (except root) of c′ with label ΓE(s1, .., s|Γ|) that there are derivations of sl from Γl with root node nl and f-description FDl in G∖↑=↓ (as defined in Definition 3) such that all long reentrancies in FD = FDl ∪ {n1 = ni ∣ i = 2, .., |Γ|} can be shortened using the n1 equations and function assignments. Along the induction, we define descriptions (sFD enlarged by the instantiated E components of the derivation in G′) such that (a) sFD ⊆ , (b) ([n1]) = E, and (c) is clash-free. The clash-freeness of FD (and FDl) then follows from (a) and (c). If n′ is a terminal node then its label has the form (a) or (ϵ), and ρn′ is (a) → ϵ or (ϵ) → ϵ. Hence Aa → a or Aϵ → ϵ is in R by construction of G′ and there must trivially be a derivation of the required form. Moreover, with = ∅ also the other claims hold trivially because FD and E are empty. Now suppose that n′ is a nonterminal node with label ΓE(s1, .., s|Γ|) and daughters , .., with labels (, .., ), .., (, .., ), ΓE(s1, .., s|Γ|) → (, .., )..(, .., ) is an instantiation of = ΓE(β1, .., β|Γ|) → (, .., )..(, .., ), and the claims hold for the daughter predicates by inductive hypothesis. The inductive step is illustrated in Figure 9. Without loss of generality we can assume that is the root node of the derivation of for each j = 1, .., m and i = 1,.., |Γj| (rename the nodes if necessary). Now let nl be new (root) nodes and assume nl = bl (l = 1, .., |Γ|). We obtain derivations of sl from Γl with root nl and f-description FDl, l = 1, .., |Γ|, by licensing nl through Γl → ψl and by expanding the ith occurrence in Cat(ψl) using the derivation with root (i = 1, .., |ψl|). Then each FDl is the union of Inst(Γl → ψl, (bl, βl)) and the f-descriptions of the derivations with root , and FD must be FDϱ ∪ . It follows directly from the construction of and the inductive hypothesis that all long reentrancies in FD can be shortened. From the construction of G′ and the derivations, it is easy to see that sFD = SFDϱ ∪ sFDj. Now let = sFDϱ ∪ E[*/[n1]] ∪ . Then, obviously, sFD ⊆ . To show (b) and (c), recall that two nodes can be related in a single instantiated annotation only if the nodes stand in a mother–daughter relationship or if they are identical. Thus, sFDϱ and each can share only the class [] and only equations of the form ([] f σ) = v can factor into the closure of sFDϱ ∪ E[*/[n1]] and . Set FD′ = sFDϱ ∪ E[*/[n1]] ∪ Ej[*/[]] as in the definition of G′. Since ([]) = Ej by inductive hypothesis and ([]) = Ej, j = 1, .., m, and ([n1]) = E by construction of , we obtain (b) ([n1]) = E; moreover, for each j = 1, .., m, the restriction of to the classes in is equal to . Hence, the h values are preserved and the clash-freeness of follows by Lemma 6 from the clash-freeness of FD′ and the , which concludes the induction step. Thus G∖↑=↓ derives s from S if G′ derives (s) from SE for some E ∈ 𝓔, and hence s ∈ L(G∖↑=↓) if s ∈ L(G′).
Lemma 8 follows trivially from the corresponding result for finite-copying LFGs (Seki et al. 1993).
Lemma 8
For any k-LCFRS G there exists a k-bounded LFG G′ with L(G) = L(G′).
Hence we have Theorem 1.
Theorem 1
k-Bounded LFGs are weakly equivalent to k-LCFRSs.
Because the recognition and emptiness problems are decidable for LCFRSs, the same must hold for k-bounded LFGs.
Corollary 3
For k-bounded LFGs the recognition and emptiness problems are decidable.
5.3 The Construction of Strongly Equivalent LCFRSs
It is straightforward to augment the right-side predicate symbols of weakly equivalent LCFRSs with an additional component that records the annotations of the nonterminal daughters from which they originate. We can thus acquire access to the f-structure if we arbitrarily hypothesize at each left-side predicate symbol a subset of the set of G∖↑=↓ annotations. The predicates in N′ other than the start predicate S′ will have the form ΓE:D where D is a set of admissible (non-trivial) annotations. To illustrate the rule construction, consider as an example for a predicate symbol ΓE:D the refinement {(* p) = v}:{(↑ x) = ↓} of the predicate symbol {(* p) = v} that we used in our previous illustration of the grammar construction. Here, the D component records that the co-referring categories and are (both) annotated by (↑ x) = ↓ in G∖↑=↓. Again for the sequence of rules in (24) we proceed along the lines of Definition 23. The annotation component of the right-hand side predicates then just collects the annotations of the right-hand side daughter categories from which they are created. For our example rules we thus obtain, for example, the LCFRS rule in (29).
This component is empty for the right-hand side of the start rules S′(X) → SE:∅(X) (because the root category of an LFG derivation is not annotated).
The additional properties of strongly equivalent LCFRSs that enable f-structure recovery are made more precise in the following definition. In the definition we use the annotation projection An to access the annotations of the LFG rules. The An projection is defined for every annotated category (X, D) by An(X, D) = D.
Definition 24
For the rules in (20) this construction produces the rules in (30). All other candidate rules are again either useless or redundant.
With these elaborated rules an f-structure can be recovered from an LCFRS derivation that has been formulated in terms of derivation trees as per Definition 22. The nodes of a derivation tree can then be used to instantiate the ↑ and ↓ metavariables in the annotations in exactly the same way as in derivations with LFG grammars. The following extension of this definition then provides the required access to the f-structure. The instantiation function Inst is defined for the refined LCFRS rules as specified in Definition 2.
Definition 25
Let G be a k-bounded LFG and G′ be the k-LCFRS obtained from G as described in Definition 24. A pair (c, ρ) consisting of a labeled tree c and a mapping ρ from the nodes of c into R′ is a derivation of a tuple of terminal strings (s1, .., sj) from Bwith functional descriptionFDand f-structureF in G′ iff
- (i)
(c, ρ) is a derivation of (s1, .., sj) from B,
- (ii)
FD = ⋃{Inst(ρn, (n, dts(n))) ∣ n ∈ Dom(ρ) and n is a nonterminal node},
- (iii)
F = M|𝒜∪𝒱, where M is a minimal model of FD.
As an illustration, consider the derivation of nnnvvv in the LCFRS comprising the rules in (30). The derivation tree of this derivation is depicted in Figure 10 and its licensing rule mapping is given in (31).
From the f-description depicted in (32), it is easy to see that this grammar associates with nnnvvv the same f-structure as the 2-bounded LFG grammar in (1) from which it was created.
Along the lines of the argument used in the proof of Lemma 7 it is straightforward to see that G and G′ derive the same string/f-structure mapping.
Theorem 2
Let G be a k-bounded LFG and G′ be the k-LCFRS constructed for G as in Definition 24. Then ΔG = ΔG′.
Because the f-description of every derivation in G′ is clash-free, the f-structure can be read out from any particular derivation in linear time.
5.4 Other Descriptive Devices
The LFG formalism as originally proposed by Kaplan and Bresnan (1982) includes formal devices beyond the primitive attribute and value mechanisms of the basic subclass, and LFG theory contemplates configurations of annotated rules that do not obviously meet the restrictions we depend on. In this section we show how those devices and configurations can be accommodated in the k-bounded framework, either by conversion to equivalent rules and annotations of the more restricted form or by extensions to the k-LCFRS translation procedure.
C-Structure Regular Predicates and Boolean Combinations of Elementary Annotations. The full LFG notation allows functional requirements to be stated as arbitrary Boolean combinations of basic annotations. It also allows the right-hand sides of c-structure rules to denote arbitrary regular languages over annotated categories. Rules with the richer notation, including Kleene-star iterations, can be linearized into collections of productions all of which are in conventional context-free format and have no internal disjunctions and which together define the same string/f-structure mapping as a grammar encoded in the original, linguistically more expressive, notation (Wedekind and Kaplan 2012). Although sufficient in principle to establish the LCFRS equivalences, such a conversion may not be possible in practice (see the discussion in the next section).
Kleene-star iterations are conventionally translated into equivalent right-linear expansions that make use of trivial annotations. For example, the iteration of the annotated category (A, D) in (33a) is replaced by a new optional trivially-annotated category AD, and the rule (33b) is introduced to properly expand that category. (In (33b) we use the traditional parenthetic notation for optionality, to collapse into a single rule the expansion with or without the recursive category.)
Obviously, grammars with such recursions would violate condition (ii) of Definition 10, because this by itself expresses no bounds on the trivially annotated dominance chains. There must be additional properties of the grammar that prevent the annotations in D from creating unbounded zippers. We discuss this problem in the next section.
Adjuncts. In LFG the f-structures of modifiers that serve as adjuncts of a predicate are represented as elements of the set-valued attribute adjuncts. Formally, this is accomplished by annotations of the form ↓ ∈ (↑ adjuncts) stating that the value of the adjuncts attribute is a set that includes as one of its elements the f-structure associated with the annotated constituent.18 Set elements behave formally like daughters without function assignments. They therefore can be handled by our original construction without further modification. However, because LFG allows the rules to contain Kleene-starred adjunct phrases as in (34), the number of adjuncts is potentially unbounded.
The modifier Kleene-star iteration can be translated to an equivalent right-linear expansion as above. The trivial-annotations in this particular right-linear expansion do not have to be eliminated in order to locally disclose structure sharing through zipper unification: Derivations of set-valued adjuncts cannot create or add to zippers. We substitute for the recursive trivial a variant ↑ ≐ ↓ that is opaque to the trivial-elimination procedure and thus not removed from the grammar. It is carried along by the LCFRS translation algorithm (using an extended sFD closure supporting equations of the form a ≐ b)19 and interpreted as a node identity only during f-structure construction.
Positive and Negative Constraints. LFG annotations are divided into two classes: defining and constraining annotations. The instantiated constraining annotations are evaluated once all instantiated defining annotations have been processed and a minimal model (of the defining statements) has been constructed. The constraining devices introduced by Kaplan and Bresnan (1982) are constraining equations and inequalities, and existential and negative existential constraints. If a constraining statement is contained in an f-description FD, it is evaluated against a minimal model M of the defining statements of FD in the following way: M ⊧ t = ct′ iff M ⊧ t = t′ (constraining equation), M ⊧ t iff ∃t′(M ⊧ t = t′) (existential constraint), M ⊧ ¬γ iff M ⊭ γ (negation of a constraining or defining statement).
It is fairly straightforward to extend the LCFRS construction in Definition 23 to LFG grammars with negative constraints. If we assume the negative constraints, just as the atom-valued defining statements, to be propagated across the E components, then we can properly account for them by requiring the negative constraints of FD′ to be satisfied in a minimal model M of the defining statements of FD′. The extension to grammars that make use of positive constraints is for several reasons far more challenging.
The construction of Definition 23 creates LCFRS rules that simulate the derivations of a corresponding LFG grammar. Consistency of atomic-value information that might be inherited from other rules of an LFG derivation is enforced by matching the predicates of individual LCFRS rules, given that the predicates are refined by E components that hypothesize a collection of potentially inherited values. The E values include those that are necessary to guarantee that derivations contain no clashing atomic values, but as specified they may also include information that is not defined elsewhere in the simulated derivation. Positive constraints thus cannot be tested against these overly large E components. An alternative refinement strategy is exploited to ensure that constraints can be checked against properly limited subsets of information.
We build on the specification of Definition 23 substituting Ě refinements for the atom-value components of the LCFRS rules constructed there. For the terminal expansions Aa → a and Aϵ → ϵ, the LCFRS rules have the form (a) → ϵ and (ϵ) → ϵ, as in the original procedure.
- (i)
FD′ = sFDϱ ∪ Ěj[*/bj] is clash-free,
- (ii)
Ě = {(* σ) = v/(* σ′) ∣ FD′ ⊢ (b σ) = v/(b σ′) and |σ|, |σ′| ≤ ℓ}.
Given that the positive constraints are ℓ-bounded, the description FD′ now provides all the information necessary to determine whether or not a positive constraint in sFDϱ is satisfied below the ϱ-expanded nodes of an equivalence class of the simulated derivations.20 Thus, only positive constraints that are not satisfied in the minimal model of the defining statements of FD′ and are hence supposed to be satisfied higher up in the simulated derivations require some special attention.
We augment the LCFRS predicate symbols with an additional component Č that records the constraining equations (* σ) = cv and existential constraints (* σ) (|σ| ≤ ℓ), in the following abbreviated by (* σ)[= cv], that are supposed to be satisfied higher up in the simulated derivations. For the terminal expansions and the start rules these components are empty. Thus, these LCFRS rules have the form (a) → ϵ, ,(ϵ) → ϵ and S′(X) → SĚ:∅(X).
- (i)
if (b σ)[= cv] is in sFDϱ but not satisfied in a minimal model of the defining statements of FD′, then (* σ)[= cv] is included in Č,
- (ii)
if (bj σ′)[= cv] is in sFDϱ or (* σ′)[= cv] in Čj and (bj σ′)[= cv] is not satisfied in a minimal model of the defining statements of FD′, then Č includes (* σ)[= cv] with FD′ ⊢ (b σ)[= cv] ≡ (bj σ′)[= cv] and |σ| ≤ ℓ.
In this extended LCFRS construction, the unsatisfied constraints in sFDϱ and the daughter Čj are propagated to the mother Č. Because the Č components of the start rules are empty, all constraints in the lower Č components and instantiated class descriptions sFDϱ must be satisfied in the simulated derivations.
Completeness and Coherence. The Completeness and Coherence Conditions are the formal devices in LFG that enforce the subcategorization requirements of individual predicates. The governable grammatical functions that can and must appear in an f-structure are specified in its local semantic form, the single quoted values of its pred atttribute. We can take these requirements into account by interpreting the subcategorization frame as a collection of existential constraints. For completeness, we introduce a positive existential constraint (↑ g) for each function g that a semantic form governs. For coherence, we pair with every assignment of a governable function g a positive existential constraint that tests whether (↑ pred) designates a semantic form that subcategorizes for g. The result of these augmentations is a grammar that properly enforces all subcategorization requirements.
Indexing of Semantic Forms. Semantic forms are also indexed or instantiated in the sense that a new and distinct value is created to represent each semantic form as it enters into a derivation (Kaplan and Bresnan 1982). This would result in a clash if a description contains two instantiations ([n] σ) = s where s is a semantic form and not a simple atomic value. We elaborate the conditions of the LCFRS construction to ensure that this property of semantic forms is respected in all simulated derivations. Specifically, we exclude an LCFRS rule if the combination of sFDϱ and Ěj hypotheses that make up its justifying description would include two or more instances of the same semantic form.
6. Practical Considerations
We have shown that for every k-bounded LFG G there is a weakly equivalent k-LCFRS G′ and that the f-structures assigned by G can be recovered easily from the derivations of a refinement of G′. We also know that the universal recognition problem for any k-LCFRS G′ can be solved in time 𝒪(|G′| ⋅ nk⋅(r+1)) (Seki et al., 1991). The recognition problem is tractable in the mathematical sense—polynomial in the length of the input sentence—but the polynomial term may be dwarfed by the grammar constants. In this formula, |G′| is the number of rules in G′, n is the length of the input string, r is the rank (the maximum number of nonterminals (predicates) on the right-hand side of G′ rules), and the given k is the fan-out (the maximum arity). Thus the practical benefit of recognizing a k-bounded LFG language with an equivalent LCFRS depends on the fan-out and rank but also on the size of the LCFRS.
We can establish bounds on the parameters of the general complexity formula (|G′|, k, r) as a function of the characteristic properties of natural language grammars.21 We first consider linguistically motivated grammars in the basic formalism we have defined here and thereafter grammars used in the more flexible format that is typically used in linguistic practice.
The translation of a given k-bounded LFG grammar G to an equivalent LCFRS G′ is carried out in two phases, and each has an effect on the size of G′. In the first phase the trivial ↑ = ↓ annotations are eliminated and in the second phase a collection of LCFRS rules is created for each sequence of up to k rules of G∖↑=↓.
The first phase, the elimination of trivial annotations from a basic k-bounded G, can by itself result in a grammar G∖↑=↓ that is much larger than G. The growth in the number and length of the rules depends on the bound h on the height of functional domains in G and the number of alternative rules, the degree of ambiguity, that expand the categories that are trivially-annotated in G. From Definition 14 we know that the size of the nonterminal category set |N| must be an upper bound for h, but that may substantially overestimate what is needed for LCFRS translation of natural language grammars. In linguistically motivated grammars the distribution of trivial-annotations is regulated by the principles of X-bar theory and its structure–function mapping principles (Bresnan 2001, Chapter 6; Dalrymple 2001, Chapter 4). In this (ϵ-free) framework the components of an f-structure unit are introduced recursively by trivially-annotated categories, and the height of a functional domain is effectively bounded by the number of coheads that can associate to a single predicate, the number of discontinuous c-structure phrases that can realize a particular function, and the number of different grammatical functions that an individual predicate can govern. With g denoting the maximum number of governable grammatical functions that can occur together and c denoting the maximum number of cooccurring coheads plus 1 (accounting for the lexical head), the maximum height is given by kg + c. Then, because any daughter sequences must be promoted for any trivially annotated category in the worst case, the size of G∖↑=↓ cannot exceed |G|kg+c+1. (Increasing the exponent by one accounts for the trivial-free rules obtained from sequences shorter than kg + c.)
The g, c, and k parameters are typically rather small. For instance, the lexicons of the broad-coverage, commercial-grade Pargram grammars for English and German (approximately 25,000 words each) have no word that subcategorizes for more than four grammatical functions and very few words allow even that many (in English, only the word bet).22 Thus, for these grammars g is 4.
The |G| parameter in the G∖↑=↓ size formula anticipates that every rule of the LFG grammar participates in the trivial elimination process, but this likely overstates what is necessary to ensure that all zippers can be properly identified. We noted in relation to the recursive expansion of adjuncts (Section 5.4) that trivials that cannot affect the equivalence classes of a derivation do not interfere with the LCFRS construction, and therefore do not have to be removed. The recursive adjunct categories are a special case of a more general class of categories that are inert with respect to zipper interactions. A category is inert in this sense if the zippers within its expansions in all possible derivations do not unify with the zippers outside. It is safe to protect from elimination all trivial annotations attached to inert categories simply by replacing them with ↑ ≐ ↓. Kaplan and Wedekind (2019) observe that internal adjuncts of discontinuous NPs are another instance of inertness. Thus even certain function-assigned categories can be inert and their function assignment (↑ f) = ↓ can therefore be replaced with a variant annotation of the form (↑ f) ≐ ↓ that explicitly indicates that this function assignment does not conceal zipper interactions. The effect of converting = to ≐ on inert categories is to break the chain of daughter-sequence promotions and to reduce, perhaps substantially, the number of rules of G that actually contribute to the size of G∖↑=↓.
The second phase of the LCFRS construction is the refinement of LCFRS predicates by the elements of 𝓔 that represent the possible combinations of atomic values. Here we see that the potential growth is limited by the properties of linguistically motivated feature systems. The attribute sequences σ in atomic-value annotations can be characterized more precisely as consisting of a sequence γ of zero or more grammatical function attributes followed either by the pred attribute or a sequence μ of one or more attributes drawn from a set of morphosyntactic features. The morphosyntactic attributes do not appear in reentrancies and so remain stable as reentrancies introduce variation in their γ prefixes. Morphosyntactic attributes may therefore be ignored when determining the length limit of attribute chains in 𝓔 components. The parameter ℓ is thus the length of the longest γ subsequence, and according to the Functional Locality Principle its maximum value is 2. The γ attributes can be distinguished from the μ attributes as those that appear only in reentrancies. But in theory and practice, the attributes in μ and the atomic values are statically typed in feature declarations (King et al. 2005) so that the set of values is partitioned across the direct morphosyntactic features (e.g., num), and the direct features are partitioned across the grouping features (e.g., agreement). Typically, the grouping features are merely for conceptual convenience, in which case they do not by themselves give rise to an increase of the description space. It may also be the case that different combinations of morphosyntactic features may be associated with different grammatical functions: The feature num may be associated with subj and obj but not xcomp while xcomp but not the nominal functions may take passive. Thus for linguistic grammars the atom-valued information space 𝓔 is much more sparse than it might be for an unregulated feature system, and the ℓ expansion factor has a tight universal bound.
However, even for linguistically motivated feature systems, the second phase might still result in a huge number of E refinements. This can arise, for example, if lexical items are assigned alternative combinations of agreement features and if reentrancies have the effect of manipulating their γ prefixes so that they propagate unpredictably across the equivalence classes in derivations. But different kinds of reentrancies propagate information in different ways and impose different requirements on the E rule refinements that correctly simulate their operation. Reentrancies of the form (↓ g) = ↑ are the bottom–up counterpart of top–down function assignments. When they appear in the annotations of a candidate rule sequence, an agreement feature matching g in a daughter E component must be reflected (with g removed) in the mother component. These lifting reentrancies require information to propagate bottom–up, but the degree of expansion is strictly limited by the ℓ bound, just as for function assignments.
Reentrancies in a second subclass introduce more feature variability and hence larger E refinements. Reentrancies of the form (↑ g) = (↑ h), for example, result in the replacement of the initial g by h in otherwise required elements of the mother E component. The effect is still limited by the local annotations because it depends on what other agreement features and function assignments are separately required by the candidate rule sequence. It is also limited by the fact that the E components are restricted to the morphosyntactic feature that can associate with both g and h functions.
The most harmful reentrancies, in terms of grammar expansion, are those of the form (↓ g) = (↑ h) that specify, for example, the relationship of an xadjunctsubj to a matrix function or when a long functional control equation (e.g., (↑ xcompsubj) = (↑ obj)) is reduced to a short reentrancy (↓ subj) = (↑ obj). These propagate information through candidate rule sequences without regard to other local properties and thus must give rise to E components that are locally unconstrained. Of course, as just noted, only morphosyntactic features common to both g and h must be considered, and this removes many features from consideration in linguistic grammars where control typically ranges over nominal functions with common feature sets. This is not an accidental property of control equations in linguistically motivated grammars; it was claimed as a universal principle of language in the earliest formulations of LFG theory (Bresnan et al. 1982). The Lexical Rule of Functional Control stipulates that lexical control equations can only be of the form (↑ xcompsubj) = (↑ gf), where gf is one of subj, obj, or obj2, and the Constructional Rule of Functional Control provides for phrasal annotations that pair the function assignment (↑ xadjunct) = ↓ with the short control reentrancy (↓ subj) = (↑ gf). These principles may gain their explanatory and descriptive power because natural languages are organized to minimize the computational impact of these most promiscuous reentrancies.
We then obtain a bound on the size of G′ for a linguistically motivated G as follows. Because the rank of the LCFRS G′ is bounded by g + c and the LCFRS predicates for the grammatical functions can at most be k-ary and the predicates for the coheads are unary, the size bound on G∖↑=↓ already accounts for rule sequences of length up to k and therefore the number of 𝓔-unrefined skeleton LCFRS rules can also not exceed |G|kg+c+1. With a denoting the maximum number of attested agreement/pred feature combinations, the size of G′ is bounded by ag+c+1|G|kg+c+1.
The broad-coverage grammars written by practicing linguists and grammar engineers diverge from our basic formalism in two ways: They come equipped with separate and quite extensive lexicons and they use single rules with regular right sides to succinctly represent the many alternative daughter sequences of particular categories. Obviously, LCFRS equivalents can be constructed for grammars in this format by applying a preprocessing step that converts them to the terminals and rules of the basic formalism, but this initial expansion step may result in starting grammars for LCFRS analysis that are already of impractical size.
It is not difficult to preserve the modularity of a separate lexicon and to insulate the LCFRS construction from the sheer number of lexical entries. An entry associates a word to a set of senses, each of which is a prelexical category/annotation pair. The naive approach would create a new terminal and prelexical rule for each such word-sense combination, but this does not take advantage of the fact that sense specifications are shared by large subsets of the vocabulary. Almost all intransitive verbs, for example, are marked with the same category and with annotations that differ only in the particular relation (walk vs. die) embedded in their pred semantic forms. A simple technique is to remove the specific relations from semantic forms, group the senses into classes according to the categories and relation-abstracted annotations, let those abstracted senses enter as terminals into the LCFRS construction, and keep for later use a separate lexical table that maintains the association between semantic relations and abstracted senses. LCFRS size is then a function of the relatively small number of abstract-sense classes, not the overall size of the vocabulary.
It remains to address the second growth factor in the LCFRS construction for broad-coverage LFG grammars, the conversion from the regular right side format of typical LFG rules (as compiled into finite-state machines) and the linearized representation of the basic rule format that we have exploited in our theoretical approach to LCFRS construction. As an example, the finite-state encoding of the English progressive VP rule before trivial elimination has 15 states and 31 transitions, defining 324 linear paths with an average length of 11. Trivial elimination can be carried out on the finite-state representation, and this produces a larger but still quite manageable finite-state machine with 52 states and 96 transitions. But the number of paths encoded in that machine is substantially greater, 110,000, and that would be the size of the equivalent set of linearized, basic rules that would enter into the equivalence-class construction. Thus, linearization of regular expression right sides is theoretically possible, as described in Section 5.4, but the resulting expansion in the number of basic rules, essentially a conversion to disjunctive normal form, may then become the major challenge to practicality.
A potential solution is to perform some or all of the equivalence class calculations directly on the transitions of the finite-state machine before enumerating all of its paths. It is easy to extract all the transition labels (categories and annotations) that assign a particular grammatical function and to test them for mutual satisfiability. Finite-state operations can be applied to prune the machine of paths that contain inconsistent combinations, and the remaining transitions can be reorganized to bring together the labels of consistent combinations. This changes the sequential order of the categories, but in exactly the way this is done in the LCFRS rule construction. The difference is that a single adjustment might produce a class representation that is common to a large number of basic rules. The machine that emerges from this process would be a compact representation, in finite-state form, of all and only the LCFRS rules that meet the conditions of Definition 23; no unsatisfactory paths would remain in the machine.
In the worst case, where equivalent categories are randomly distributed throughout the machine, the machine will incrementally grow to approximate a disjunctive set of independent rule sequences. But experience with XLE and the Pargram grammars suggests that equivalent categories for natural language grammars are distributed more systematically. The algorithms in XLE that are key to its high-speed performance are optimized for the situation where categorial annotations are relatively independent of the annotations on distant categories, that annotations interact in a mostly context-free-like way (Maxwell and Kaplan 1996). The incremental enumeration of equivalence classes in a finite-state setting should also benefit implicitly from this property. XLE operates directly on the finite-state machine representation of the grammar, interpreting on the fly the alternative transitions leaving a state. If an LCFRS parsing algorithm can be augmented to also operate in this way, it may never be necessary to read out the full set of paths.
Because of the complexity of the regular expressions and the massive use of disjunctions, |G′| can still be impractically large. Thus, for realistic grammars it may be worth applying an alternative strategy to parsing that avoids constructing the LCFRS for all rules and features of G and instead builds an LCFRS at parse-time only for a given input. This strategy relies on the fact that for many grammar formalisms, such as context-free grammars, the result of the intersection of a language L(G) with a singleton string-set {s} is describable as a specialization Gs of G that assigns to s effectively the same parses as G would assign (Bar-Hillel et al. 1961; Lang 1994). The parsing problem is divided into four steps. In the first step, the LFG G is specialized to an LFG whose context-free backbone language consists only of the given input string s (if s belongs to the backbone language of G), as Lang and others have pointed out. This can be done in cubic time by any number of context-free parsing algorithms, modified simply to record the annotations associated with the nonterminal categories. The size of the resulting grammar Gs is proportional to |s|3, and Gs is ks-bounded if G is k-bounded, for ks ≤ k. The other parameters affecting LCFRS size and parsing complexity, gs, cs, and as), are also typically much smaller than those for G (for example, gs is less than 4 for English sentences that do not contain bet). In the second step, Gs is translated into the equivalent but correspondingly small LCFRS just as described earlier. In the third step (recognition) an LCFRS recognition algorithm determines whether there is at least one parse that satisfies the restrictions defined by the original annotations. The effect is that s ∈ L(G) iff s ∈ L(). In the last step (enumeration) f-structures from alternative parses can be produced one by one. This strategy has the practical advantage that the LCFRS translation does not involve rules or features (including predicates and their subcategorization frames) that are not relevant to the given input. Notice that the rules for the specialized grammar are no longer in regular-expression/finite-state machine format, and sophisticated algorithms that operate on finite-state machine encodings of LCFRS rules will provide little computational benefit.
As mentioned, parsing and generation systems—for example, the XLE system—have been developed that are practical for broad-coverage grammars and naturally occurring sentences (Crouch et al. 2008). XLE is based on the lazy contexted constraint satisfaction method developed by Maxwell and Kaplan (1991, 1996) that is optimized for context-free structures in which disjunctions arising from words and phrases that are distant from each other in the string do not interact. This is because XLE multiplies disjunctions at context-free constituents only if needed. Moreover, XLE does not require the regular right sides to be linearized. These features of the XLE parsing algorithm, which have been proven extremely effective, suggest a third, presumably more efficient parsing strategy.
If the core XLE strategy is modified so that it takes account of the zippers (of the reentrancy-free kernel) before the contexted constraint satisfaction algorithm is applied for testing clash-freeness, the algorithm would also take advantage of the multiple context-freeness (of natural languages). Then, for k-bounded LFGs the modified algorithm would also perform well for discontinuous constructions, and only the relatively rare multiply embedded control constructions could force the algorithm to expand into DNF the specifications of the broader scopes associated with the controller and controllees.
7. Concluding Remarks
The k-bounded LFG grammars form a decidable proper subclass of the LFG formalism, restricted both in notation and in the properties of their derivations. We have suggested that the notation for annotating individual c-structure categories is still linguistically suitable in that it allows for function assignments, trivial annotations to identify heads and coheads, long (but respecting Functional Locality) reentrancies for control, height-bounded functional domains, and value specifications for feature instantiation and agreement. Further, the derivational conditions of Definitions 13 and 14 provide a technical characterization of the informal linguistic notion that reentrancies in general are not constructive. Whereas off-line parsability ensures only that the parsing problem is decidable, the restrictions developed in this article are sufficient for the decidability of the emptiness and generation problems as well. We demonstrated that these requirements make possible the translation into equivalent LCFRSs, and we explored the effectiveness of some alternative strategies for LFG parsing.
We have focused in this article on the formal devices proposed by Kaplan and Bresnan (1982) and still in common use. Modern LFG, however, includes a number of more sophisticated mechanisms that were later on woven into the theory. Among these are devices for the f-structure characterization of long distance dependencies and coordination: functional uncertainty (Kaplan and Zaenen 1989; Kaplan and Maxwell 1988a), set distribution for coordination, and the interaction of uncertainty and set distribution (Kaplan and Maxwell 1988b); and devices whose evaluation depends on properties of the c-structure to f-structure correspondence, namely, functional categories and extended heads (Zaenen and Kaplan 1995; Kaplan and Maxwell 1996) and functional precedence (Bresnan 1995; Zaenen and Kaplan 1995).
We know that some of these devices are needed for the analysis of phenomena that can only be modeled by tractable extensions of LCFRS but not LCFRS per se (Becker et al. 1992; Boullier 1999; Kallmeyer 2010a). Boullier (1999) has demonstrated that range concatenation grammars (RCGs) can handle long-distance scrambling (in German) and Kallmeyer (2010a) that gapping in coordinated structures can be modeled by literal movement grammars of constant non-linearity. We thus conjecture that the LFG devices used for describing these phenomena can—at least to the extent needed in order to provide linguistically motivated analyses—be captured in these extensions of LCFRS. Some evidence for this conjecture is provided by Rambow (1997), who sketches how to model functional uncertainty in unordered vector grammars with dominance links, a TAG related (tractable) framework. Peled and Wintner (2015) describe a unification-based formalisms that is equivalent to RCG. Although this formalism differs considerably from the more elaborate versions of the LFG formalism that include a number of separate but interacting mechanisms, their restrictions might still help in modeling these LFG mechanisms in tractable extensions of LCFRS.
The LFG formalism has proven to be rich enough in its descriptive power to enable concise characterizations of complex syntactic dependencies in languages of many different types. The formalism is also rich enough so that many computational problems are undecidable and others are intractable. The question we have addressed here is whether there are limitations in the notations or its interpretation that eliminate the computational excesses of the formalism while preserving its descriptive utility, and perhaps even provide a deeper computational explanation for some principles of linguistic organization. Taking the LCFRS formalism and the class of multiple context-free languages as the touchstone of our investigation has enabled us to specify a small number of conditions that are sufficient for tractability and may still be linguistically suitable. One consequence is that the methods and formal results pertaining to multiple context-free languages may lead directly to a better understanding of other aspects of LFG and perhaps other unification-based formalisms. This may also lead to incremental improvements in existing LFG systems, such as XLE. At a higher level, the interplay between the descriptive succinctness of the LFG formalism and its mildly context-sensitive equivalents may help to define a tighter formal envelope around the class of natural languages.
Appendix A. The Elimination of Trivial Annotations
Given the height bound of Definition 10, we can remove all trivial annotations. For the elimination we assume that any occurrence of ↓ in the annotated reentrancies and atomic-valued annotations of a nonterminal carrying ↑ = ↓ is replaced by ↑. This replacement is legitimate because it is equivalence-preserving and the resulting annotations are admissible.
Lemma 1
For any suitable LFG G we can construct a suitable LFG without ↑ = ↓ annotations, denoted by G∖↑=↓, such that ΔG = ΔG∖↑=↓.
Proof
Let G = (N, T, S, R) be a suitable LFG grammar. Without loss of generality assume that the reentrancies and atomic-valued annotations of nonterminals annotated with ↑ = ↓ do not contain ↓. Now let be the smallest set that includes R and that is closed under the following rule: if A → ϕ(B, D)ψ ∈ , with ↑ = ↓ ∈ D, and B → (X1, D1)ω ∈ , then A → ϕ(X1, D1 ∪ (D∖{↑ = ↓}))ωψ ∈ . Define R′ as the subset of that only includes rules A → ϕ where for any (B, D) in ϕ with B ∈ N, ↑ = ↓ ∉ D. By the condition of Definition 10, R′ and are finite. Set G∖↑=↓. Obviously, G∖↑=↓ is suitable. Then ΔG ⊆ ΔG∖↑=↓ by a simple bottom–up induction on the derivations in G and ΔG∖↑=↓ ⊆ ΔG because for any rule r = A → (X1, D1)..(Xm, Dm) in R′ there is by construction of R′ a corresponding derivation of X1..Xm from A in G that yields the same f-description as r if every instantiated trivial annotation n = nj is removed and the instantiating daughter node nj is eliminated by substitution [nj/n].
Appendix B. Suitable LFGs with (Finitely) Bounded Reentrancy-Free Kernel
A suitable LFG G has a bounded reentrancy-free kernel if there is a k such that for every derivation (c, ρ) of a terminal string from S with f-description FD and clash-free FD∖𝓡 in G∖↑=#x2193;, |{n′ ∣ FD∖𝓡 ⊢ n = n′}| ≤ k, for any node n of c. The decidability proof for this boundedness problem is (similar to that of the k-boundedness problem presented in Section 4.1) based on an enumeration of all derivations of G∖↑=↓ up to a certain depth. Here, however, we attempt to find a derivation that can be “pumped”, that is, one that can be used to show that the equivalence classes are not finitely bounded, and not just a derivation with an equivalence class greater than some given k. Before we present the proof we introduce the pumping operation.
Let (c, ρ) be a derivation and [] and [nl] be two node classes where at least one node in [] dominates a node in [nl]. Suppose that h assigns to [] and [nl] the same set of atomic-valued schemata and there is a mapping from [nl] to [] such that for all ni in [nl], (ni) and ni are licensed by the same rule. Now let δ be a function that maps the nodes in [nl] to their dominating nodes in []. Then we can pump the derivation if there is a nonempty subset V of [nl], the image of V under ((V)) is smaller than V, and the nodes in V are dominated by the nodes in the image of V under ((V) = δ(V)). This situation is illustrated in the left-hand derivation tree of Figure 11. For defining the pumped derivation, we make use of the Repl operator specified in Section 4.1. Figure 11 illustrates on the right the result of the pumping process for the derivation tree on the left. The pumping operation (pu) is formally defined as follows.
Definition 26
- (i)
h([]) = h([nl]),
- (ii)
there is a function from [nl] to [] such that
- (a)
ρg(ni) = ρni, for each ni ∈ [nl], and
- (b)
there is a nonempty subset V of [nl] such that (V)| < |V| and (V) = δ(V),23
- (a)
For any given derivation (c, ρ) with two equivalence classes [nl] and [] satisfying the conditions of Definition 26, pu(c, ρ)([], [nl]) is also a derivation of G∖↑=↓ that by construction still gives rise to the equivalence classes [nl] and []. Now let be the mapping from [nl] to [] as defined for the original derivation (shown in the right derivation of Figure 11). Because the original derivation satisfied = ρni, for each ni ∈ [nl], and we expanded every ni by the subderivation rooted at (ni), the new derivation must also satisfy = ρni. Hence also h([nl]) = h([]) and, as a consequence, the f-description that G’s reentrancy-free kernel provides for this derivation is clash-free. (Note that this would not be guaranteed if were not total.) Thus, the classes [nl] and [] of pu(c,ρ)([], [nl]) satisfy the conditions of Definition 26 and the pumping operation can be applied again.
Because the renamed ni from V of the pushed down copies of the value derivations must be equivalent (, , , and in the example of Figure 11) and is required to satisfy |(V)| < |V|, the number of these equivalent copies must be greater than |V|. Thus, by further pumping, the number of equivalent nodes that trace back to V through renaming must stepwise grow. Under these conditions the reentrancy-free kernel of G is not finitely bounded.
Lemma 9
The reentrancy-free kernel of a suitable LFG G is unbounded if and only if there is a derivation (c, ρ) with clash-free FD∖ in G∖↑=↓ and |[n]| > bd, with d = ⋅ |𝓔|, for at least one node n of c.
Proof
Let (c, ρ) be a derivation in G∖↑=↓ with clash-free FD∖𝓡 and suppose that |[n]| > bd for some node n of c. Let root = 0..s = n be the path to n with immediately dominating . For each i = 0, .., s − 1, let be the nodes of [] that dominate the nodes in []. For s, set = []. Now let f be a function that assigns to each [], j = 0, .., s, the triple (ρ([]), ρ(), h([])) consisting of the rules that license the nodes in [], the subset of those rules that license the nodes of [] that dominate the nodes in [], and the description assigned to []. Since |[]| > bd, there must be nodes , .., = on the path .. with q > d and || < ||, for j = 1, .., q − 1. Because, for each f([]) = (P, Q, E), j = 1, .., q, P is a nonempty subset of R, Q is a nonempty subset of P, and E is an element of 𝓔 the number of distinct f values is bounded by d. Thus, if q > d, there must be two classes [], [] that (with V = ) satisfy the conditions of Definition 26.
Given the proofs of the pumping lemma and the decidability of the k-boundedness (Lemma 3), it is now easy to see that finite boundedness can be determined by inspecting all derivations of (c-structure) depth less than or equal to 2d + |Pow+(R) × 𝓔|. We show here by contradiction that it can be decided whether there is a derivation in G∖↑=↓ that satisfies the conditions of Definition 26 by inspecting G∖↑=↓’s derivations only up to depth 2d − 1. The rest of the proof (that justifies |Pow+(R) × 𝓔| + 1) is similar to the proof of Lemma 3. Thus suppose there were a derivation (c, ρ) in G∖↑=↓ with clash-free FD∖𝓡 with a path root = .. with s ≥ 2d and the pair [], [], l < s, but no higher pair would satisfy the conditions of Definition 26. Because s ≥ 2d, there must be a pair of classes [], [], i < j < s such that f([]) = f([]) and either l ≤ i or l ≥ j. If this pair does not allow pumping then || = || and there must be a g from [] to [] that includes and allows it to shrink the derivation. Shrinking then lifts the pair of classes satisfying the conditions of Definition 26, which leads to a contradiction with our assumption.
If the reentrancy-free kernel of G∖↑=↓ is finitely bounded then the k bound is the maximum size of the equivalence classes of all these derivations.
Lemma 10
For any suitable LFG G it is decidable whether there is a k such that for every derivation (c, ρ) of a terminal string from S with f-description FD and clash-free FD∖𝓡 in G∖↑=↓, |{n′ ∣ FD∖𝓡 ⊢ n = n′}| ≤ k, for any node n of c.
Acknowledgments
The authors would like to thank the three anonymous reviewers for their helpful suggestions and comments on earlier drafts of this article.
Notes
LFG’s generation problem is only undecidable for cyclic f-structures. If only acyclic f-structures are considered the problem is decidable even for grammars that are not off-line parsable (Wedekind and Kaplan 2012).
Long reentrancies can also be used for agreement of atomic values (e.g., (↑ subjnum) = (↑ num)). However, because there is a finite bound on the number of agreement features, agreement can always be equivalently encoded through collections of atomic-valued annotations and thus requires no special consideration.
Empty nodes are disallowed in some modern versions of LFG, particularly when long-distance dependencies are characterized by functional uncertainty rather than traces (Kaplan and Zaenen 1989; Dalrymple et al. 2015). Unless the theory retains some variant of the original Off-line Parsability Constraint, the undecidability of the recognition problem can be demonstrated even in the absence of empty nodes.
Such an encoding has been used in the undecidability proof of the generation problem (from cyclic f-structures) in Wedekind (2014). This proof depends on annotations that lie outside of the suitable notation, but the argument can easily be reformulated for suitably annotated LFGs.
Feinstein and Wintner (2008) show that unification grammars are equivalent to linear indexed grammars (LIGs) if, for each rule, the mother feature structure is required to share at most one structure with the daughter feature structures. As indicated by the proper inclusion of LIGs in LCFRSs, the use of reentrancies is overly restricted in this formalism.
The collection of trivial-annotated nodes that map to the same f-structure has been called a functional domain in linguistic theory (Bresnan 2001).
Instead of an explicit boundary condition as in Definition 10, the bound can also be specified as an extragrammatical parameter. We return to that option in Section 6.
Even though our example grammar (with only four lexical rules) yields a ↑ = ↓-free grammar with very few rules, the elimination of the trivial annotations can—without further restriction—easily result in a grammar that is substantially larger than the original. We return to this issue in Section 6.
Note that based on the short-reentrancy grammars of the reduction to the emptiness problem a similar argument can be used to show that, even for grammars with only short reentrancies, the nonconstructivity of reentrancies would be undecidable if in the definition all node equalities n = n′ following from FD (and not FD∖𝒜) were required to follow from FD∖𝓡.
We provide a technical interpretation of Coherence (and Completeness) in Section 5.4.
For k-bounded LFGs the universal generation problem is decidable as well. This follows directly from the decidability result in Wedekind and Kaplan (2012). That article describes an algorithm that produces for an arbitrary LFG grammar G and an arbitrary acyclic input f-structure F a context-free grammar GF that describes exactly the set of strings that the given LFG grammar associates with that f-structure. For k-bounded LFGs all node equalities follow from the instantiated function-assigning annotations of an f-description. Thus, even if the input f-structure contains cycles, the context-free grammar can be constructed on the basis of the terms that are defined in the canonical expansion of the f-structure but do not encompass a cycle. This is again a finite set. Then for any G and arbitrary F, G derives a terminal string with F iff L(GF) ≠ ∅ (which is decidable for context-free grammars).
We show in Appendix B that for any suitable LFG it is also decidable whether or not there is a finite bound on its reentrancy-free kernel.
It may be the case that such a left-branching proof must start with a reflexive equation tm+1 = tm+1 that is not in FD but can be inferred by partial reflexivity from an equation (tm+1 σ) = t″ in FD. Partial reflexivity is the restriction of reflexivity to well-defined (object denoting) terms. It is a sound inference rule for the theory of partial functions for which full reflexivity does not hold.
We demonstrate this for a proof of an equation of the form (n σ) = (n′ σ′). For equations of the form (n σ) = v′, v = (n′ σ′), and v = v′ the proof is similar. Obviously, for any proof of (n σ) = (n′ σ′) in †FD we obtain a proof of ([n] σ) = ([n′] σ′) in sFD just by replacing the equivalence classes for the nodes.
Now consider a proof of ([n] σ) = ([n′] σ′) of the form (18) in sFD. Choose a canonical representative from each class and substitute the classes by their representatives. For each premise ( χ) = (′ χ′) (with ([] χ) = ([′] χ′) in sFD) that is not in †FD, there must be an equation (ň χ) = (ň′ χ′) in †FD with ň ∈ [] and ň′ ∈ [′]. Since ň = and ň′ = ′ follow from FD∖𝓡, we obtain a proof in †FD by replacing each such premise ( χ) = (′ χ′) by a proof of ( χ) = (′ χ′) from (ň χ) = (ň′ χ′) where ň is rewritten to and ň′ to ′ by a sequence of substitutions all justified by equations in FD∖𝓡.
Note that this result holds also for †FD∖𝒜 and sFD∖𝒜, because the function-assigning annotations that generate the equivalence classes are also included in the atom-free kernel.
If G does not contain reentrancies of the form (↑ fg) = (↑ h), (↑ f) = (↑ h), and (↓ g) = (↑ h), then, for the proof in (19), m = 1 and a = a′ follows from a = (a1f1) and (a1f1) = a′. Thus the node class equality must arise from an undesirable interaction between instantiations of cyclic or function-assigning annotations or annotations of the form (↓ g) = ↑. For such grammars G, the condition of Definition 13 can be decided by inspecting derivations of depth less than or equal to 2|𝓜≤k (R) × 𝓔| + 1.
Because of the elimination of annotated terminals the LFG rules are partitioned into unannotated terminal rules and nonterminal rules with nonterminal or preterminal daughters.
The following argument applies equally to open adjuncts (xadjuncts).
If (a σ) = v ∈ and a ≐ b ∈ FD, then (b σ) = v ∈ .
Note that the Ě components also include reflexive equations of the form (* σ) = (* σ) whose class instantiations follow from the sFD of the simulated derivations. This allows it to also account for existential constraints whose satisfying feature paths do not arise from atomic-valued equations.
Without appealing to further linguistic restrictions, G′ can be exponentially larger than G (as indicated by the fact that the universal recognition problem for k-bounded LFGs, with attributes drawn from an unbounded set, is NP-complete). Thus, even though k-bounded LFGs are weakly equivalent to k-LCFRSs and k-LCFRSs can be recognized in polynomial time, LCFRS recognition algorithms may, without further restrictions, fail to be practical for equivalent k-bounded LFGs. (This situation is similar to that in Generalized Phrase-Structure Grammar (GPSG) where a corresponding (seeming) recognition paradox is also due to the effect of grammar size on recognition performance (see Ristad 1987, for details): Any particular GPSG grammar can be converted into an equivalent context-free grammar, but recognition with the equivalent grammar may be impractical because of its size.) And even with fixed bounds on the height of a functional domain, the rank and the maximum number of trivially-annotated categories in a rule, the number of attributes and values, and the length of the atomic-valued annotations, the k-LCFRS G′ in principle can be astronomically larger than the k-bounded LFG G.
Kaplan and Wedekind (2019) observe that the schematic prescriptions of X-bar theory allow for arbitrary repetitions of discontinuous branches with complement and cohead nodes that map to the same f-structure and thus give rise to functional domains that appear to exceed any finite height bound and equivalence classes that are not k-bounded. Linguists may not be aware that their succinct X-bar grammars admit strings with unbounded derivations that are not accepted in the language, and that this descriptive inadequacy may be accompanied with undesirable computational properties as we have outlined in this article. Kaplan and Wedekind distill the formal issue down to a simple empirical question about the magnitude of these two extragrammatical parameters, the cohead repetition limit and the degree of discontinuity. This question has not yet been the focus of specific linguistic investigations.
Note that the specification of V implies that at least one value branches into two or more nodes of V.