Abstract
This article describes an approach to Lexical-Functional Grammar (LFG) generation that is based on the fact that the set of strings that an LFG grammar relates to a particular acyclic f-structure is a context-free language. We present an algorithm that produces for an arbitrary LFG grammar and an arbitrary acyclic input f-structure a context-free grammar describing exactly the set of strings that the given LFG grammar associates with that f-structure. The individual sentences are then available through a standard context-free generator operating on that grammar. The context-free grammar is constructed by specializing the context-free backbone of the LFG grammar for the given f-structure and serves as a compact representation of all generation results that the LFG grammar assigns to the input. This approach extends to other grammatical formalisms with explicit context-free backbones, such as PATR, and also to formalisms that permit a context-free skeleton to be extracted from richer specifications. It provides a general mathematical framework for understanding and improving the operation of a family of chart-based generation algorithms.
1. Introduction
Algorithms providing compact representations of alternative syntactic analyses have been the state-of-the-art in parsing for many years. For context-free grammars, for example, the well-known chart parsing algorithms have been used for more than four decades. These assign to a sentence not just one possible analysis but a chart that compactly represents all possible syntactic analyses. Algorithms have also been developed that extend packing to the functional specifications of unification grammars by producing compact representations of feature-structure ambiguities as well. One that is pertinent to (but not restricted to) Lexical-Functional Grammar (LFG) is the contexted constraint satisfaction method developed by Maxwell and Kaplan (1991). These algorithms lead to better average time performance because they carefully manage the ambiguities that are rampant in natural language. They work by dividing the parsing problem into two phases, a recognition or satisfiability phase that creates the compact representation and determines whether there is at least one parse, and an enumeration phase in which the alternative parses are produced one by one. Parsing performance is typically identified with the complexity of the first phase (e.g., the cubic bound for context-free parsing), because the collection of all parses can be delivered to a client application merely by presenting the compact representation. A client may be able to select a limited number of particularly desirable parses, perhaps the smallest or the most probable, without doing a full enumeration (Johnson and Riezler 2002; Kaplan et al. 2004).
Lang (1994) gives a clear formal characterization of the first phase of context-free chart parsing.1 He observes that the recognition problem consists of finding the intersection of the language of the grammar with the input string, and then testing to see whether that intersection is empty. Many language classes are closed under intersection with a regular set, and the result of the intersection of a language L(G) with a regular language α is describable as a specialization Gα of G that assigns to all and only the strings in α effectively the same parse trees as G would assign. Lang argues that a chart for an input string s (a trivial regular language) and a context-free grammar G can be regarded as a specialization Gs of G that derives either the empty language (if s does not belong to L(G)) or a language consisting of just that input. In this view a parsing chart/grammar is a representation that makes it possible to enumerate all the derivation trees of the string, guaranteeing that each tree can be produced in a backtrack-free way in time proportional to its size. This guarantee holds even for an infinitely ambiguous string: It would take forever to enumerate all valid derivations, but any particular one can be read out in linear time. The procedure for tree enumeration follows directly from the standard context-free generation algorithm applied to the grammar Gs.
The generation problem for LFG and other description-based grammatical formalisms can also be viewed from this perspective. Several algorithms have been proposed for generation that avoid redundant recomputation by storing intermediate processing results in a chart-like auxiliary data structure (e.g., Shieber 1988; Kay 1996; Shemtov 1997; Neumann 1998; Carroll et al. 1999; Moore 2002; Carroll and Oepen 2005; Cahill and van Genabith 2006; White 2006; de Kok and van Noord 2010). Most of them can be construed as having a first phase that provides a compact representation for alternative results, in this case for the strings that the grammar provides for a given functional or semantic input. The individual generated strings are then produced by an enumeration procedure operating on this compact representation.
In this article we observe that the edges of a generation chart can be interpreted as rules of a specialized context-free grammar, just as in Lang's (1994) characterization of parsing. We present a generation algorithm that specializes the context-free backbone of a given LFG grammar to a grammar that describes exactly the strings that the LFG grammar relates to a given acyclic f-structure. Derivations of the resulting grammar simulate all and only those derivations of the LFG grammar whose derived strings are assigned to that input.2 Thus the generated string set is a context-free language compactly represented by the specialized grammar, and the individual members of that language can be enumerated, just as for parsing, by using standard context-free generation algorithms.
Our approach can be seen as a generalization and formalization of other chart-based generation algorithms, producing all and only correct outputs for larger combinations of grammars and inputs. It extends to unification grammars with explicit context-free backbones, such as PATR (Shieber et al. 1983), and also to formalisms that permit a context-free skeleton to be extracted from richer specifications. But it does not extend to cyclic input structures because, as we will show by example, an LFG grammar might relate to a cyclic structure a set of strings that is not context-free. Because acyclic structures are normally assumed to be the only f-structures that are motivated for linguistic analysis (Kaplan and Bresnan 1982), this restriction does not seem to limit the applicability of our algorithm for natural language generation.
We begin with some background so that we can make the problem and its solution more explicit. Along with many other description-based grammar formalisms, an LFG grammar G assigns to every string s in its language at least one f-structure. This situation can be characterized in terms of a derivation relation ΔG, defined as follows:
- (1)
ΔG(s, F) iff G assigns to the string s the f-structure F
- (2)
In accordance with the basic architecture of LFG, an LFG grammar provides a set of licensing conditions that determine grammatical representations by descriptive, model-based rather than procedural methods. The well-formedness of the representation in Figure 1 with respect to the grammar in (2) is thus characterized as follows.
The c-structure is valid or well-formed because we can assign to each nonterminal node a grammar rule that licenses or justifies the local mother–daughters configuration constituted by the node and its immediate daughters. If we assume that the c-structure of Figure 1 consists of the nodes root, n1, .., n8 and these nodes are related and labeled as depicted in Figure 2, then the rule-mapping ρ that justifies the c-structure is given by (3).
- (3)
A description of the f-structure (called the f-description) for this tree and rule-mapping is constructed by instantiating the annotations of all justifying rules in the following way. For each rule justifying a local mother–daughters configuration, all occurrences of the ↑ symbol (called a metavariable) in the functional annotations of the daughters are replaced by the mother node, and for each of the daughter categories, all occurrences of the ↓ metavariable in its annotations are replaced by the corresponding daughter node.3 Thus, the ↓ of the annotations on a daughter category of a rule and the ↑ of the annotations of the rule that further expands that category are always instantiated with the same node. The complete f-description is the union of the instantiated descriptions of all the justifying rules. The f-description obtained from the c-structure in Figure 1 and the rules of the justifying mapping in (3) is given in (4).
- (4)
- (5)
The solution in (5) is converted to the f-structure representation in Figure 1 by removing the node labels that record the relation of the f-structure to the c-structure. From a formal point of view, an f-structure is obtained from the minimal model of an f-description by restricting its interpretation function to the attributes and atomic feature values of the grammar, thus disregarding the nodes and their interpretation.5
We now turn to the generation problem. A generator for G provides for any given f-structure F the set of strings that are related to it by the grammar:
- (6)
The abstract generator characterization in (6) is of course dual to the one for a parser for G, since a parser produces for any given terminal string s the set of f-structures that are assigned to it by G:
- (7)
- (8)
- (9)
[H V]
From a cognitive point of view it seems unrealistic that the number of sentences that a natural language grammar relates to an f-structure is infinite. As a minimum, there should be some relationship that bounds the size of the c-structure of a sentence by the size of the f-structures associated with it. Such a structural relationship would then force the related sentences to form a finite set. Studies to determine intuitively plausible restrictions are rather scarce, however, and proposals for such restrictions are not yet generally accepted. It is thus still an open question whether grammars of actual natural languages satisfy the particular resource-boundedness restrictions on which termination of some existing chart-based generators depends.
Even if only finite sets of sentences are related to the f-structures, these sets might still be very large. Experiments with a broad-coverage German LFG grammar (Dipper 2003; Rohrer and Forst 2006) have shown that, because of the scrambling that German allows, a given f-structure might be related to a huge set of long sentences.6 This has consequences at least for those approaches that assume the output of generation to be a word lattice (Langkilde and Knight 1998) or a finite-state machine representing the (finite) set of all generated sentences. A lattice can represent a large collection of strings compactly only if they are characterized by independent sets of alternative substrings. Scrambling languages, however, have alternative substrings that reappear in different positions with complex cooccurrence dependencies and therefore cannot be shared in a lattice representation (see Langkilde [2000] for discussion). Our context-free grammars (and also Langkilde's [2000] and Knight and Langkilde's [2000] forest representations) offer a much more compact encoding under these circumstances, and their structure and formal properties are as well understood as lattices and finite-state machines.
Our approach might also be more appropriate than existing chart-based approaches for optimality-theoretic generation (Kuhn 2001, 2002, 2003). An optimality-theoretic LFG system consists of two components: a universal LFG grammar and a language-specifically ordered set of violable constraints (Bresnan 2000). The universal LFG grammar is used to produce the candidate space of possible analyses (consisting of the c-structure/f-structure pairs that are derivable by the grammar). The optimal and thus grammatical analyses are those candidates that violate the fewest constraints. A technical problem comes from the fact that the universal grammar by design may assign an infinite number of c-structures and string realizations to a given f-structure, and the optimal outputs can be identified only by evaluating all of these against the collection of constraints. Our context-free characterization provides a finite evaluation procedure even for an infinite candidate space. By virtue of the pumping lemma for context-free languages (Bar-Hillel, Perles, and Shamir 1961; see also Hopcroft and Ullman 1979) we can enumerate the c-structure trees assigned to an input f-structure one by one in order of increasing depth. Because the number of constraint violations increases beyond a certain number of recursive category expansions, the optimal results from the infinite space can be chosen after examining only a finite number of relatively small structures (see Kuhn [2003] for details).
Similar to Lang's approach to parsing (see also Billot and Lang 1989), we provide a general framework encompassing all forms of chart generation in a single formalism. This is because existing chart-based generators can be understood as concrete but somehow restricted algorithm/datastructure implementations of our context-free grammar construction. These restrictions may lead them to produce incorrect outputs in some situations. Because we show the correctness of the output grammar for unrestricted LFG grammars, our framework allows us to examine, compare, and improve on existing chart-based generation techniques.
The organization of this article is as follows. In the next section we define the fundamental formal objects of LFG theory and the relevant relationships among them. Section 3 is the technical core of the article. There we present and prove the correctness of the context-free grammar-construction algorithm for LFG grammars with arbitrary equational constraints and acyclic input f-structures. The grammar construction abstracts away from specific details of data structure and computational strategy not essential to the mathematical argument. Performance and computational strategy are then briefly considered in Section 4, and Section 5 compares our approach to other generation algorithms. In Section 6 we identify a fundamental limitation of our approach, demonstrating that the context-free property does not hold for elementary equational constraints if the input f-structure contains cycles. On the other hand, if the input is acyclic, the basic context-free construction can be extended beyond simple equations to the additional descriptive devices proposed by Kaplan and Bresnan (1982) and still in common use. This is shown in Section 7. The last section highlights some additional consequences of this approach.
The present article elaborates on ideas that we first presented in Kaplan and Wedekind (2000). In that paper we outlined a context-free grammar construction for a subclass of LFG grammars with restricted functional annotations and single-rooted input structures. Here we consider a more general class of grammars and inputs that requires a more rigorous mathematical analysis.
2. Preliminaries
We start with a formal characterization of LFG grammars with equational statements. Let V* denote the set of all finite strings over V. An LFG grammar G over a set Σ of attribute and value symbols is defined as follows:
Definition 1
We next define how instantiated descriptions are obtained from the rules by substituting for the ↑ and ↓ metavariables elements drawn from a collection of terms. C-structure nodes are included among the terms, but later on we also make use of additional elements. We define a function Inst that assigns to each m-ary rule r, term t, and term sequence t1 .. tm the instantiated description that is obtained from the annotations of r and the terms by substituting t for ↑ and tj for ↓ in the annotations of all j = 1,..,m daughters. In the following definition we use the (more compact) linear rule notation A →(X1,D1)..(Xm,Dm) that we prefer in more formal specifications.
Definition 2
The derivation relation for LFG grammars (ΔG) is defined as already described informally in the previous section. This is based on context-free derivation trees. Let us assume that root is the root node of any c-structure c, and that dts is a function that assigns to each nonterminal node n of c the sequence of its immediate daughters (dts(n)). Context-free derivations are then defined as follows:
Definition 3
A labeled tree c and a rule-mapping ρ from the nonterminal nodes of c into the rules of context-free grammar G is a context-free derivation of string s from nonterminal B in G iff
the label (category) of root is B,
the yield is s,
for each nonterminal node n with label A and dts(n) = n1..nm with labels X1, ..,Xm, respectively, ρn = A → X1..Xm.
When we informally described LFG derivations, we pointed out that we obtain the f-structure from the (up to isomorphism) unique minimal model of the f-description by restricting it to the attribute–value set Σ. This is formalized in the following definition by requiring the f-structure to be isomorphic (≅) to M|Σ, the restriction to Σ of a minimal model M of the derived f-description. The effect of the isomorphism is to abstract away from the particular properties of different f-structure models that have no linguistic significance. Moreover, because we operate on an arbitrary member of the class of isomorphic structures without regard to any of its accidental or nonsignificant properties, we know that our analysis applies to all members of the class.
Definition 4
A labeled tree c and a mapping ρ from the nonterminal nodes of c into R is an LFG derivation of string s with functional description FD and f-structure F in LFG grammar G iff
the label (category) of root is S,
the yield is s,
for each nonterminal node n with label A and dts(n) = n1..nm with labels X1, ..,Xm, respectively, ρn = A → (X1..D1)..(Xm..Dm),
FD is satisfiable,
if v is an atomic feature value and a is any other constant (atomic feature value or node) occurring in FD,
if v is an atomic feature value and σ is a nonempty sequence of attributes,
where M is a minimal model of FD.
Conditions (vi) and (vii) are syntactic versions of the constant/constant and constant/complex clash conditions that together capture LFG's functional uniqueness condition (the denotations of an atomic feature value and any other distinct atomic feature value or node constant have to be distinct (vi); atomic feature values have no attributes (vii)).8 A model of an f-description, like the restricted one in (viii), is a pair consisting of a universe and an interpretation function I. The interpretation function assigns to each constant occurring in the f-description an element of and to each attribute a unary partial function on .
Note that we create the f-description by instantiating the ↑'s and ↓'s by the nodes of a given c-structure. Thus, we conceive of these terms as constants and will refer to them on the f-description level sometimes as node constants rather than nodes. Because the instantiating nodes are uniquely determined if we have a mapping ρ licensing a given c-structure, in the following we abbreviate Inst(ρn, (n, dts(n))) by Inst(ρn).
In general, two descriptions D and D′ are said to be equivalent (D ≡ D′) iff the restrictions of their minimal models to Σ are isomorphic.
Definition 5
Let D and D′ be two descriptions with minimal models M and M′. Then D ≡ D′ iff M| Σ ≅ M′|Σ.
From Definition 4 we obtain the derivability relation Δ as follows.
Definition 6
A terminal string s is derivable with f-structure F in G (ΔG(s,F)) iff there is a derivation of s with F (with some f-description FD) in G.
In this context we repeat the definition of the set of strings GenG(F) that an LFG grammar G relates to a given f-structure F:
Definition 7
In the next section we establish the basic result of this article: We present an algorithm to construct for an arbitrary LFG grammar G and any acyclic f-structure F a particular context-free grammar that provides a formal representation for the language GenG(F).
3. Constructing the Specialized Grammar for GenG(F)
In the process of generation, the c-structures and the f-descriptions for an input f-structure F are the unknowns that must be discovered to confirm that a given string belongs to the set GenG(F). The set of valid c-structures that G provides for F is clearly a subset of the trees that are generated by the context-free backbone of G. But this subset might be infinite, as we have already seen with the input (9) and the grammar in (8), because there is in general no fixed finite upper bound on the length of the strings related to F or the size of their c-structures. Whether or not a given tree is a valid c-structure for F then depends on the properties of the f-description that arises by instantiating with the proper node constants the annotations on the individual rules that license the derivation of that tree. The valid c-structures are just those trees for which F is the f-structure of the resulting f-description.
Because of the possibly unbounded size of the c-structures, there is also no fixed upper bound on the number of node constants that may occur in an f-description for F. However, because the number of f-structure elements to which the node constants actually refer is bounded by the size of F, it must be possible to obtain for any derived f-description FD an equivalent description whose constants are drawn from a fixed finite set. For instance, if we introduce a distinct canonical constant for each element of F, we can create an equivalent description by substituting for each node constant in FD the canonical constant associated with the functional element corresponding to that node. This substitution typically reduces the number of distinct terms needed for instantiation, and its usual effect is to replace several different node constants with a single canonical term. But these replacements will provide an equivalent description because we substitute a given term for two node constants if and only if it logically follows from FD that those two nodes map to the same element of F. Thus, if FD discriminates between two elements of F, so will the description that results from such a reducing substitution.
Our context-free grammar construction crucially depends on the ability to find for every f-description of F (from every possible c-structure) an equivalent description that involves only a finite number of distinct instantiation terms. This is what enables us to simulate all the conditions for correct LFG generation with a finite set of context-free category labels and a finite set of context-free productions, and thus to rely on the finite control of the rule-by-rule category matching process of context-free generation to produce the strings in GenG(F).
The set of terms that correspond directly to the elements of F is large enough to enforce all functional discriminations for an f-description associated with a complete c-structure, as we have suggested. But unfortunately that set may not be large enough to keep track of all necessary distinctions as an f-description is created in an incremental context-free derivation process. For some f-structures and some grammars it may not follow from the description associated with one portion of a derivation tree that two nodes map to the same functional element, even though that identity does follow when equations in the f-description for the entire tree are taken into account.
Suppose that a three-daughter LFG start rule provides the instantiated annotations (root F) = n1, (root G) = n2, and root = n3. Based only on this information we cannot tell whether n1 and n2 can map to the same element of the f-structure and therefore whether it is correct to substitute the same canonical constant for both of them. It depends on whether the larger description that incorporates the expansion of the third daughter implies the identity of the n1 and n2 structures. The same-constant substitution would preserve equivalence only if the larger description implies that (n3 F) = (n3 G). We must have two distinct constants available until that implication is deduced in the course of the derivation, even if the input f-structure does not contain separate elements for those constants to correspond to.
Thus the set of constants needed to correctly reduce an arbitrary description as a derivation proceeds incrementally may be larger than the number of elements in the input f-structure and larger than what is required for an equivalent description for a complete derivation. However, for each acyclic F we show that there is always a finite set of canonical terms that can maintain all necessary functional discriminations as a derivation unfolds. We use this set to construct a reducing substitution that permits generation to be carried out under finite control. In contrast, we observe in Section 6 that the partial descriptions of cyclic structures cannot safely be reduced without an unbounded number of canonical terms.
For the derivations that the simple LFG grammar in (8) provides for the input [H V], we can accomplish the reduction of the f-description space with only two terms, the canonical constant root and a separate canonical constant ⊥ that serves as a value for all nodes that do not occur in an f-description. Let us start with the shortest derivation for the given input. This consists of the c-structure in (10), which is licensed by the rule-mapping .
- (10)
- (11)
The reducibility of the f-description space for F provides the key insight for our context-free grammar construction. The construction is accomplished in three steps. In the first step we identify (as illustrated earlier) a finite set of canonical terms that can serve in reducing the f-description space that G provides for F.
In the second step we use these terms to construct a set of instantiated rules of G. These instantiations are “appropriate” in the sense (to be made precise later) that they maintain all necessary distinctions. They are formed by associating with the metavariables canonical terms that can legitimately be used to reduce the corresponding nodes of a local tree of a potential derivation of F. For the grammar (8) and the terms root and ⊥, for example, there are only three appropriately instantiated rules, the two rules contained in (11b) and . We then determine all collections of appropriately instantiated rules that together provide descriptions of F without mistakenly collapsing a functional discrimination. For our particular example there are just two collections of instantiated rules that provide a description of [H V], namely, and the set in (11b). These collections are drawn from the power set of the appropriately instantiated rules, so there is only a finite number of them and each contains a finite number of instantiated rules. This ensures that we can determine the f-description space that G provides for F without knowing the details of the derivations for F and their reductions.
In the third and final step we create the context-free grammar that simulates exactly those derivations in G whose strings are assigned the f-structure F. The categories of this new grammar consist of refinements of the categories of the context-free backbone of G together with a distinct root category SF. The original categories are augmented with two additional components, a canonical instantiation term as used in the first step, and a subset of one of the instantiated-rule collections determined in the second step. The term component is used to encode the reducing substitution for the f-description of a simulated derivation, and the rule component is used to record the reduced instantiations of the licensing LFG rules whose application must still be simulated in order to complete that derivation. The productions of the new grammar are created from the rules contained in the instantiated-rule collections by replacing the original categories by a certain number of their refinements, and then adding a particular set of start rules. The start rules expand the root category of the new grammar to the original start symbol augmented by root and one of the instantiated-rule collections determined in the second step.
The context-free grammar thus constructed has a much larger set of categories and many more rules than G. It is organized so that the normal matching of categories in a context-free derivation globally ensures that the refined rules simulate all derivations of F in G whose f-description is reducible to a description provided by one of the instantiated-rule collections determined in the second step. Because we have already indicated that every f-description of F must be reducible to a description provided by one of these instantiated-rule collections, the constructed grammar simulates exactly the set of derivations that G provides for F. The strings of GenG(F) are obtained by removing the additional components from the categories of the terminal strings. With r1 abbreviating and r2 abbreviating our construction produces for the LFG grammar in (8) and the input [H V] a context-free grammar that contains the rules in (12).
- (12)
In the remainder of this section we first identify the finite set of canonical terms that can be used to reduce the f-description space for a given f-structure F derivable with an LFG grammar G. We then investigate in Section 3.2 the problem of reducing the f-description space for F and G. In Section 3.3 we give a precise recipe for constructing the context-free grammar for F and G, and in Section 3.4 we illustrate this with a few examples.
3.1 Identifying the Reducing Terms
Our reduction of the f-description space makes use of the fact that we can eliminate certain node constants from an f-description FD without risk of producing a description not equivalent to the original. This is because some node constants can be defined in terms of others. We proceed rule-wise top–down based on the following definability relation.
Definition 8
Let r be an m-ary LFG rule, t be a term, and a1..am be a sequence of constants of length m, each of them not occurring in t. A constant aj is m(other)-definable in Inst(r, (t, a1..am)) iff there is a (possibly empty) σ such that Inst(r, (t, a1..am)) ⊢ aj = (t σ).
If the constant aj that instantiates the ↓ for a particular daughter is m-definable in terms of (t σ) in Inst(r, (t, a1..am)), then all functional discriminations will be preserved if aj is eliminated in favor of the term (t σ) from any description containing this instantiated description of r.
To illustrate the elimination process, let us assume that our grammar includes among its rules the ones in (13).
- (13)
- (14)
In this derivation, the node constants n1, n2, n4, n5, and n8 are m-definable whereas root, the adverbial nodes n7 and n10, and the terminal nodes are not. For all m-definable constants, we can construct definitions rule-wise top–down in the following way. We begin with the start rule and derive from its instantiated description {(rootsubj) = n1, root = n2} the definitions n1 = (rootsubj) and n2 = root for the m-definable daughters n1 and n2. We then continue with the rules that expand n1 and n2. Let us consider the VP rule that expands n2. For the m-definable n2 we use the already constructed definition to replace n2 by its defining term root in the instantiated description of the VP rule. From we then derive the definitions n4 = root and n5 = root for its m-definable daughters, and so forth. If we run like this through the whole derivation, for all m-definable daughters we obtain defining terms that do not contain mother-definable node constants. For our example these are the ones in (15).
- (15)
By substituting all mother-definable node constants by their defining terms we can then produce from the original f-description the equivalent description in (16).
- (16)
Thus we see that a mother-definable constant can be eliminated in favor of a constant corresponding to a higher node and a sequence of attributes leading down through the f-structure. The constants that are not eliminable are the root constant root (if it occurs in FD) and all daughter constants that occur in FD but are not mother-definable. At least for acyclic f-structures, however, we can show that there is an upper bound on the number of these remaining constants. This is because the remaining constants must each denote one of the elements of the given f-structure, but no two of them can denote the same element. This is a consequence of LFG's instantiation procedure and functional uniqueness condition, and the acyclicity of the f-structure.
Given LFG's instantiation procedure, as formalized in Definition 2, two distinct node constants can be related in a single equation only if the nodes stand in a mother–daughter relationship. Thus a daughter and a node external to the mother cannot be related directly by instantiation but only as a consequence of a deduction involving at least one instantiated annotation of some other licensing rule. Because of LFG's functional uniqueness condition the equations involved in such a deduction cannot contain atomic feature values. The constant/complex clash condition (vii) of Definition 4 prevents atomic values from being substituted for proper subterms and the constant/constant clash condition (vi) prevents them from being equated to nodes. Thus such a deduction can only involve equations relating a daughter to its mother (or a node to itself).
If an undefinable daughter corefers with a node external to the mother, then the deduction that relates them must involve an instantiated annotation of that daughter that is (up to symmetric permutation) of the form (↑ σ) = (↓
Lemma 1
Let c and ρ be a derivation with f-description FD for an acyclic f-structure in G. Ifnjis a daughter of n and njis not m-definable in Inst(ρn) thenfor all n′ not dominated by njand all (possibly empty) sequences of attributes χ.
Proof
The following corollary follows directly from Lemma 1.
Corollary 1
Let c and ρ be a derivation with f-description FD for an acyclic f-structure in G. If njis a daughter of n and njis not m-definable in Inst(ρn) then
, and
for any distinct node n′i that is not definable in terms of its mother n′ in Inst(ρn′).
From Corollary 1 it immediately follows that the denotations of FD's undefinable node constants are biunique. So, their number must be less than or equal to the size of the universe of a minimal model M of FD. Suppose F is the f-structure for a derivation with f-description FD. Because F is isomorphic to M|Σ for any minimal model M of FD, and M|Σ and M share the same universe, we can use F's (finite) universe to define the constants that we require. Thus for each element a of the universe of F that is not denoted by an atomic value we introduce a constant aa.10
Definition 9
The set provides a sufficient number of constants to produce an equivalent reduced description by a biunique renaming of the node constants that remain after the m-definable ones are eliminated.
The renaming of the remaining undefinable constants can be accomplished, for example, if we map in the natural way each mother-undefinable daughter n corresponding to a in the isomorphic image F of M|Σ to the constant aa. Because of Corollary 1, such a mapping must be biunique. Hence it can be used to rename all undefinable daughters occurring in FD and will thus produce an equivalent description where all nodes except root are replaced by constants drawn from F.
As an illustration we pick for the f-structure depicted in Figure 4 the structure with the universe in (17a) and the interpretation function whose directed acyclic graph representation is given in (17b).11
- (17)
The constants we obtain from the structure (17) by Definition 9 are the ones in (18).
- (18)
Now, let M be a minimal model of our original f-description. Because an isomorphism between M|Σ and our structure (17) must map the denotation of n7 in M to f and the denotation of n10 to g, we can rename n7 by af and n10 by ag and obtain from (16) the equivalent description (19).
- (19)
We next compose the substitution that is induced by the definitions of the definable daughters and the substitution that we used to rename the undefinable daughters. This provides a substitution that allows us to produce from the original f-description an equivalent description in a single transformation. If we compose the two substitutions of our example, that is, the one induced by the definitions in (15) and the renaming substitution {(n7, af), (n10, ag)}, we arrive at the reducing substitution in (20).
- (20)
We now give a precise specification of a (finite) set of terms that can serve as the range of the reducing substitutions for all derivations of an f-structure F. This set is obtained from the constants in and the attributes of F in the following way. We first provide the constants with their intended interpretation by expanding F in the natural way to the canonical structure for .
Definition 10
A set of canonical terms that includes the ranges of the reducing substitutions for all possible derivations of F is a set that contains all terms of the form (aa σ) that are defined in but do not denote an element already designated by an atomic feature value. It also contains all terms that we obtain from those by substituting root for their constant symbols. This set includes all constants of (because σ can be empty) and thus all possible constant values for the mother-undefinable daughters of a derivation for F. Because each element in the universe of is denoted by a constant, also contains all possible defining terms for the mother-definable nodes of that derivation. Terms referring to the denotation of an atomic feature value are not required, since there are (because of the constant/constant clash condition) no node constants with the same denotation as any atomic feature value.
The node constant root is substituted for the constants in every term to account for the fact that different derivations may associate different functional elements with the root of the c-structure. That would be the case, for example, if our grammar contains in addition the S and VP rules (21a,b) and alternatively derives the adverbials with the rules (21c,d).
- (21)
Thus contains sufficiently many constant symbols and defining terms for the reducing substitutions to make all the distinctions that could arise from any c-structure and f-description for the given F. It is defined formally in the following way.
Definition 11
- (22)
3.2 Reducing the f-Description Space
We now shift our attention to the rules of G and their instantiating terms, that is, to the arguments of the Inst function. These are pairs consisting of an m-ary rule r of G and its instantiating terms (t, t1.. tm). Let us call such a pair an instantiation of r, or sometimes simply an instantiated rule. Let us further extend the reducing substitutions that we constructed for the derivations of F to total functions by assigning root to root and ⊥ to each non-denoting node constant. Now recall that the f-description of a particular derivation for F consists of the union of the instantiated descriptions of the rules that together license that derivation. If we consider these licensing rules together with their node instantiation, that is, pairs of the form (r,(n, n1.. nm)), and use a reducing substitution for that derivation to replace the node constants in the instantiations by canonical terms, then we obtain a collection of instantiated rules of the form (r, (t, t1.. tm)) all of which are instantiated by terms of . The union of the instantiated descriptions of these rules is identical to the description that the reducing substitution produces from the original f-description. Because R and are finite, the set of all instantiated-rule collections that we obtain from the (possibly infinite) set of derivations of F by reducing their node-instantiated licensing rules must be finite too. This fact is crucial for our grammar construction.
We further observe that the instantiated rules that result from this substitution are also appropriate in the following sense.
Definition 12
Let r be an m-ary LFG rule in R of G (m ≥ 0), F be an f-structure, , and a1..am be a sequence of length m of pair-wise distinct constants not in . Then the instantiated rule (r,(t, t1.. tm)) is appropriately instantiated (by terms of ) iff the following conditions are satisfied:
if tj = ⊥ then aj is not interpreted in a minimal model of Inst(r, (t, a1..am)),
if aj is m-definable in Inst(r, (t, a1..am)) then Inst(r, (t, a1..am)) ⊢ aj = tj,
otherwise , tj ≠ t and tj ≠ ti for all i = 1,..,m with i ≠ j.
In the following the set of all appropriately instantiated rules is denoted by IRF (IRF = { is appropriately instantiated}).
The constants a1..am in this definition provide the same discriminations as the daughter nodes of any local tree licensed by the rule. This definition is satisfied by rules that result from eliminating node constants in favor of terms in the way that we have described. Such term-instantiated rules satisfy condition (ii), because whenever the mother is instantiated by t and an m-definable daughter nj is reduced to a term then also Inst(r, (t, a1..am)) ⊢ aj = at (= (t σ)). Condition (iii) is satisfied, because of the pair-wise distinctness of the values for the mother-undefinable nodes, due to Corollary 1. And condition (i) holds, because non-denoting node constants are mapped to ⊥.12 The set IRF of all possible appropriately instantiated rules is large but finite, because R and are finite.
For our start rule (13a) , only the two instantiations in (23) are appropriate.
- (23)
- (24)
- (25)
The instantiations in (23a) and (24a) are the ones obtained from the derivation in Figure 4 and the reducing substitution (20). Note that the appropriately instantiated rule (23b) that does not associate the S node with the root constant might result from derivations where the top of the f-structure is not denoted by the root node of the c-structure, as illustrated with the rules in (21).
So far we have considered only the individual instantiated rules that we obtain from the licensing rules of a derivation for F by replacing the node constants as described by terms of . As a consequence of Corollary 1, we also observe that our reducing substitutions never replace undefinable daughters of two distinct node-instantiated licensing rules by one and the same constant. That is, the term-instantiated rules that result from two distinct node-instantiated licensing rules always satisfy the following compatibility relation.
Definition 13
The instantiated rules in (26a–c), for example, are compatible while the ones in (26d) are not. The latter rules mistakenly introduce an identity that, because of Corollary 1, can never be derived by the grammar. The rules in (26a) result from reducing the licensing rules of the derivation in Figure 4 with the reducing substitution (20).
- (26)
Our observations lead to a definition that characterizes reducing substitutions entirely in terms of the identified properties of the -instantiated rules and thus in a way that will permit us to simulate their construction by a refinement of the context-free backbone of G. In the following definition we use to denote the nodes of a c-structure c and γ[ψ] to indicate the expression that is obtained from an expression γ (term, sequence of terms, formula, set of formulas, etc.) and a substitution ψ (mapping from constants to terms) by replacing all occurrences of constants a in γ simultaneously by ψ(a).
Definition 14
Let c and ρ be a derivation of f-structure F in G and ψ be a mapping from into . Then ψ is a reducing substitution for the given derivation iff ψ(root) = root, and for all n, n′ ∈ Dom(ρ) with n ≠ n′
(ρn, (n, dts(n))[ψ]) is appropriately instantiated, and
(ρn, (n, dts(n))[ψ]) is compatible with (ρn′, (n′, dts(n′))[ψ]).
That reducing substitutions in fact preserve equivalence is then established by the following lemma.
Lemma 2
Let c and ρ be a derivation with f-description FD and f-structure F in G. If ψ is a reducing substitution for c and ρ, then FD ≡ FD[ψ].
Proof
We prove the lemma by induction on the number of nodes, according to a left-to-right, top–down traversal of the c-structure. Let c and ρ be a derivation with f-description FD and f-structure F in G, a minimal model of FD, and ψ a reducing substitution for c and ρ. We first define for each node n of c the set consisting of all nodes higher than n, all nodes of the same depth as n but preceding (on the left), and n. Now for each with let the function ψi be the restriction of ψ to . Then we can show by induction for each that FD = FD[ψi], that is, left-to-right, top–down. The equivalence is established by constructing a minimal model Mi also on the universe of M. Thus the isomorphism between M|Σ and Mi| Σ is the identity function.
The basis, i = 1, is trivial, because ψ1 = {(root, root)} by definition. Thus FD[ψ1] = FD and M1 = M is a minimal model of FD[ψ1]. Hence FD ≡ FD[ψ1]. For the induction step, let i > 1. Then FD ≡ FD[ψi−1] by hypothesis. Let be a minimal model of FD[ψi−1], and suppose that node nj with mother n is the next node in the sequence (i.e., ).
If nj is not interpreted in M, it does not occur in FD and hence not in FD[ψi−1]. Thus FD[ψi] = FD[ψi−1], Mi = Mi − 1 is a minimal model of FD[ψi], and FD ≡ FD[ψi].
If nj is interpreted in M, there are two cases to consider.
(a) If nj is m-definable in Inst(ρn, (ψi−1(n), dts(n))) and ψ(nj) = tj then FD[ψi−1] ⊢ nj = tj. Because nj does not occur in tj and hence not in FD[ψi], FD[ψi−1] is logically equivalent to the definitional extension FD[ψi] ∪ {nj = tj} of FD[ψi]. Because tj occurs in FD[ψi], Mi = Mi−1|(Dom(Ii−1)\{nj}) is a minimal model of FD[ψi]. Hence Mi − 1|Σ ≅ Mi|Σ and FD ≡ FD[ψi].
(b) If nj is not m-definable in Inst(ρn, (ψi−1(n), dts(n))) then . Let ψ(nj) = aa. Then aa cannot occur in FD[ψi−1], because the instantiation is appropriate and pair-wise compatible and aa ≠ root (= ψ(root)).13 So the model Mi that results from Mi − 1 by renaming nj by aa must be a minimal model of FD[ψi]. Thus Mi − 1|Σ ≅ Mi|Σ and FD ≡ FD[ψi].
Hence, .▪
Appropriateness and compatibility do not ensure that undefinable daughter constants are distinct from the root. This case is covered, however, because we kept root for the root.
We indicated earlier that for an arbitrary derivation of an acyclic f-structure F we can—dependent on a minimal model of its f-description—construct a substitution with range that satisfies the conditions of Definition 14. We now provide a rigorous proof of this assertion.
Lemma 3
For every derivation of an acyclic f-structure F in G there exists a reducing substitution.
Proof
Definition 15
Let F be an f-structure. Then IRDF is the set of all sets IR ⊆ IRF such that
for all (r, τ), (r′, τ′) ∈ IR with (r, τ) ≠ (r′, τ′), (r, τ) is compatible with (r′, τ′),
M|Σ ≅ F, for a minimal model M of Inst(IR).
This is a finite set whose size is bounded by a function of the sizes of R and .
Lemma 2 also shows that we can produce an equivalent description for any derived f-description of F, not only with the model-dependent substitutions used in the proof of Lemma 3, but in general with any mapping that satisfies the definition of a reducing substitution. This is important for our grammar construction, because it provides the conditions that we have to control to make sure that we simulate the derivations of f-descriptions for F together with equivalence-preserving substitutions. Under these conditions we can reduce the sets of node-instantiated licensing rules of the simulated derivations to collections that are also included in IRDF. IRDF can be determined without knowing the details of the valid derivations for F, just on the basis of F and the LFG grammar G alone.
3.3 Producing the Context-free Grammar GF
The context-free grammar GF that simulates all valid derivations for F in G is specified in the following definition. From this we can produce all strings in GenG(F) by conventional context-free generation algorithms.
Definition 16
SF → S:root:IRroot, where IRroot is any element of IRDF,
A:t:IR → X1:t1:IR1..Xm:tm:IRm such that
- (a)
there is an r ∈ R expanding A to X1..Xm,
- (b)
,
- (c)
if (r, (t, t1..tm)) ∈ IRj (j = 1,..,m), or (r′, τ′) ∈ IRi ∩ IRj and i ≠ j(i,j = 1,..,m), then (r, (t, t1.. tm)), respectively (r′,τ′), is compatiblewith itself.
- (a)
Before presenting our main theorem and its proof let us sketch how the derivations for F in G are simulated by the context-free grammar GF.
The grammar GF expands the root symbol SF to complex categories of the form S:root:IRroot containing the root category S of G as their first component. A derivation from S:root:IRroot in GF then consists of a phrase structure tree whose nodes are labeled with refinements of the categories of the original LFG grammar. By taking the Cat projection of every category, we obtain the c-structure of at least one derivation for F in G that is simulated by the derivation from S:root:IRroot in GF. The term component of the augmented categories encodes a reducing substitution ψ for the simulated derivation with the given c-structure. That is, if a node n in the GF derivation is labeled by X:t:IR, then ψ(n) = t for the corresponding LFG c-structure tree.
The component IR contains all instantiated rules of G that are required to license the subderivation in G that corresponds (under the Cat projection) to the subderivation from n in GF, except that the licensed nodes are replaced in the instantiated rules by their ψ values.17 Thus, the additional components of the root label S:root:IRroot record that ψ(root) is set to root (the initial condition for reducing substitutions) and that the node-instantiated licensing rules of the simulated derivation are reduced to IRroot by ψ. Each application of a rule A:t:IR → X1:t1:IR1..Xm:tm:IRm that expands a nonterminal node n of the derivation in GF simulates the application of an LFG rule with context-free backbone A →X1..Xm whose instantiation with (t,t1..tm) combines with the instantiated-rule components of all daughters to form the rule component IR of the mother.18 Now, ψ must be a reducing substitution for the simulated derivation, because all instantiated rules in IRroot are appropriately instantiated and pair-wise compatible and because condition (iic) of Definition 16 ensures that rules that are not self-compatible can only be used once for licensing the Cat projection. Thus, because of Lemma 2, the derivation in GF simulates a derivation of an f-description in G that ψ reduces to the equivalent description provided by IRroot.
We can also see that every derivation for F in G is simulated by a derivation in GF. We know from Lemmas 2 and 3 that we can construct for every derivation of an f-description for F in G a reducing substitution ψ that produces a description equivalent to the original one. Based on ψ we can then augment the category labels of the c-structure of a derivation for F in G by term and rule components that record ψ and the licensing rules (with the node constants replaced by their ψ values). We thus obtain a derivation from S:root:IRroot where the instantiated description provided by IRroot is equivalent to the original f-description. Because GF contains a start rule for every set of appropriately instantiated and pair-wise compatible rules that provides a description of F, there must also be a rule that expands SF to S:root:IRroot and the terminal string of the derivation for F in G must be the Cat projection of a derivable string in GF.
We are now prepared to prove our main theorem.
Theorem
For any LFG grammar G and any acyclic f-structure F, GenG(F) = Cat(L(GF)).
Proof
We prove first that GenG(F) ⊆ Cat(L(GF)). Suppose there is a derivation c and ρ of a terminal string s with f-description FD and f-structure F in G. By Lemma 3, there exists a reducing substitution ψ for c and ρ. Thus FD ≡ FD[ψ] by Lemma 2. We construct a derivation c′ and ρ′ of s′ from S:root:IRroot with Cat(s′) = s. We obtain c′ by relabeling each node n with label X by . That means that the c-structures of both derivations share the same tree skeleton. We define ρ′ for each nonterminal node n with label A:ψ(n):IR and dts(n) = n1..nm with labels X1:ψ(n1):IR1, ..,Xm:ψ(nm):IRm by . Because FD[ψ] = Inst(IRroot) by construction of IRroot and because IRroot ⊆ IRF and condition (i) of Definition 15 hold by the properties of ψ, IRroot must be an element of IRDF. Thus SF → S:root:IRroot is in RF. Moreover, Ran(ρ′) ⊆ RF, because by construction the rule components are subsets of IRroot, the rule components of the terminals are empty, and the rules satisfy (iia,b) of Definition 16 by construction and (iic) because ψ is a reducing substitution. Thus s ∈ Cat(L(GF)).
We now prove that Cat(L(GF)) ⊆ GenG(F). Suppose there is a GF derivation c′ and ρ′ of s′ from S:root:IRroot with Cat(s′) = s and IRroot. We first construct a new c-structure c with the same tree skeleton as c′ by relabeling each node n with label X:t:IR by X. We define a substitution ψ by setting ψ(n) = t for each node n with label X:t:IR. We then show that there is a mapping ρ into R licensing c with FD ≡ Inst(IRroot). By induction on the depth of the subtrees we first define for each nonterminal n a function ρn from all nonterminal nodes dominated by n into R such that
- (a)
ρn licenses the subtree of c with root n,
- (b)
if n has label A:t:IR in c′,
and, for all with
- (c)
is appropriately instantiated and
- (d)
is compatible with .
The following corollary is an immediate consequence of this theorem.
Corollary 2
For any LFG grammar G and any acyclic f-structure F,GenG(F) is a context-free language.
3.4. A Few Examples
In the preceding sections we have shown how to construct a context-free grammar that generates exactly the set of strings that an LFG grammar assigns to a given f-structure. Those strings can be produced by running a context-free generator with that grammar. In this section we provide examples to illustrate the derivation space of the constructed context-free grammar and the correspondence between the derivations of the constructed grammar and the derivations of the original LFG grammar.
As one illustration of the correspondences between the derivations, let us consider the f-structure F in (27) and the LFG grammar with the rules (13) and the VP rule in (2).
- (27)
Both the depicted substitution ψ and the subtree to which SF expands are related to the original LFG derivation by the construction of the first half of our proof. That is, ψ is a reducing substitution and the context-free derivation specializes the category label of each node n of the original c-structure. The term component is n's ψ value. The rule component is the set of all instantiated rules that result from the licensing rules of the corresponding n-dominated LFG subderivation. These are instantiated by replacing the instantiating nodes of the LFG derivation by their ψ values. Thus, the instantiated description provided by the rule component of the start rule is equivalent to the original f-description and hence the context-free derivation tree at the bottom of Figure 6 is licensed completely by the rules of the constructed grammar. Note that the Cat projection of the terminal string of the context-free derivation is the terminal string of the c-structure, the sentence John fell.
On the other hand, the depicted LFG derivation and the context-free derivation are also related by the construction of the second half of the proof. The c-structure is the Cat projection of the constituent structure that SF's daughter derives. The reducing substitution maps each node of this c-structure to the term of its complex label in the corresponding context-free derivation. And the LFG rule that the licensing mapping maps to each node is the rule of the node label's rule component that licenses the node and its daughters in the Cat projection. This is instantiated by the term components of the applied context-free rule and combines with the rule components of the daughters to form the rule component of the mother. These licensing LFG rules for the immediate daughters are shown in gray in the rule component of the node labels in the context-free derivation.
As a more complicated illustration, we sketch the derivations of the context-free grammar GF produced for the f-structure F given in (17) and the grammar comprising the rules in (13). This LFG grammar produces two terminal strings for the given input, John fell today quickly and John fell quickly today. A set of pair-wise compatible appropriately instantiated rules that yields a description of the input f-structure is, for example, the one contained in the start rule (28). This set arises from reducing the node-instantiated licensing rules of the derivation in Figure 4 with the reducing substitution (20) extended by mapping non-denoting nodes to ⊥.
- (28)
- (29)
- (30)
We see then that the left daughter of (29) matches the mother of (31) that derives the terminal symbol “”.
- (31)
- (32)
- (33)
For the same reasons, (34) is the only useful rule that matches the right daughter of (33).
- (34)
- (35)
- (36)
The only other derivation of a string with f-structure F is simulated if we use a rule like (28) except that the ADVP rules are instantiated as in (37), that is, exactly the other way around.
- (37)
There are alternative derivations in GF that also simulate these two LFG derivations, and in that sense the grammar GF allows for spurious ambiguities. These derivations differ from the given ones in that the instantiating constants of are biuniquely renamed (e.g., af by aa and ag by ad) or some of the terminal daughters with no ↓ in their annotation are biuniquely instantiated by otherwise unused constants of . In the next section we consider some computational strategies for eliminating rules that fail to produce terminal strings or give rise to spurious ambiguities.
4. Computational Considerations
So far we imposed only loose restrictions on the ingredients of the generation grammar GF, and a faithful implementation of the grammar definition may create categories and rules that are either useless or redundant. Useless rules cannot participate in the simulation of any LFG derivation while redundant ones simulate only the same derivations as other rules and categories in the grammar. There are a number of techniques for avoiding the construction of these unnecessary and undesirable grammar elements.
If the equations in an LFG rule provide alternative definitions for one and the same daughter, a naive implementation would produce distinct but equivalent daughter instantiations. Rule and category instantiations that express only uninformative variation can be eliminated by normalizing the rule annotations in advance of generation so that there is exactly one canonical function-assigning equation for each mother-definable daughter and by using that equation to construct its defining term. Normalization can be accomplished by exploiting symmetry and substitutivity to reduce the annotations of the rules to some normal form according to an appropriate complexity norm, as suggested by Johnson (1988). Another off-line computation can identify terminal daughters that are introduced with rules that do not contain ↓ and so will never be interpreted. Without loss of generality we can disregard other instantiating constants that might be drawn from and systematically instantiate all of those terminals with the distinguished constant ⊥.
We can remove another major source of redundancy by ignoring derivations that differ only by renaming of the instantiating constants of . This can arise if IRDF contains rule sets that are identical up to renaming of the instantiating canonical constants, as indicated in Section 3.4. We observed in conjunction with Lemma 3 that the f-description of every derivation for F can be reduced to an equivalent description that is satisfied in the canonical model expanded by some interpretation of root. Thus the generation grammar can be constructed by considering only the set containing those elements of IRDF whose instantiated descriptions are modeled by some root expansion of .
Even with these refinements, the last example in Section 3.4 illustrates the fact that our recipe for constructing GF may produce other useless categories and expansion rules. These cannot play a role in any derivation either because they are unreachable from the root symbol SF or because they do not lead to a terminal string. We can borrow strategies from conventional context-free grammar processing to control the production of these useless items.
A top–down approach to grammar construction is the simplest way of avoiding categories and rules that are unreachable from the root symbol. It corresponds most directly to the specification of Definition 16. The algorithm maintains three data-structures, an agenda of categories whose expansion rules have yet to be constructed, a set of terminal categories and nonterminal categories that have already been considered for expansion, and a set of constructed context-free rules. All three structures are empty at the outset. The first step of the algorithm is to add the root category SF to . Then at each subsequent step a category α is selected from and moved to , all rules α→β1..βm satisfying conditions (i) (with instead of IRDF) and (ii) of Definition 16 are added to the rule set , and each of the nonterminals βj not already in is added to the agenda. Because Definition 16 provides for a finite number of categories, the agenda eventually will become empty. At that point the algorithm terminates with containing a subset of RF sufficient to simulate all and only the LFG derivations for F. As indicated, this algorithm has the desirable property of creating just those categories and rules of GF that are accessible from the root symbol. It is guided incrementally by the c-structure skeleton of the LFG grammar. It is also guided by properties of the input f-structure as the rule component for each new category is a subset of some element IRroot of . But this procedure has the disadvantage of typically producing many categories that derive no terminal string.
An alternative strategy is to construct the categories and rules in bottom–up fashion. The bottom–up algorithm uses the same three sets, all empty at the outset. Here the first step is to add to the agenda all of the elements in the set TF of terminal categories. In each subsequent step a category is selected from and moved to , as in the top–down approach. In this case, however, we add to all rules α→β1..βm that satisfy conditions (i) and (ii) of Definition 16 and where the selected category is at least one of the daughters βj and all other daughter categories already exist in . If α is not SF, we further require α's rule component to be a subset of some IRroot so that this process is also constrained at each step by the input f-structure. The category α is added to the agenda if it is not already present in . This algorithm also terminates when the agenda is empty. It ensures that every category we construct can derive a terminal string, but it does not guarantee that every bottom–up sequence will reach the root symbol.
A more serious shortcoming of both strategies is that they presuppose the prior computation of all elements of , but neither specifies how to instantiate those rule sets in an efficient manner. A straightforward modification of the bottom–up algorithm can sidestep this difficulty. We can replace the subset test on the rule component of each α with a check to see whether the instantiated description of that component is satisfied in expanded by some interpretation of root. This test makes reference just to the canonical model of the input, examining only those features that are relevant to each potential new category. We reject a category if it fails this test, knowing that its rule component cannot be a subset of any element of . This is similar in spirit to the step-by-step subsumption test of other bottom–up generation algorithms (e.g., Shieber 1988 and Kay 1996). A further restriction is needed to filter the creation of start rules. Rules of the form SF → S:root:IR are included in only when some root expansion of is not only a model for Inst(IR) but a minimal one at that. We know in that case that we have arrived at one of the elements of . The minimality condition is an analogue of the completeness requirement of other algorithms.
The incremental satisfiability test of this modified algorithm depends on the interpretation of the node constant root, and we saw in Section 3.2 that root may denote different elements of the universe in different derivations of F. Although its eventual denotation cannot be uniquely predicted at intermediate steps of the bottom–up process, we can avoid reconsideration of root denotations already determined to be unsatisfactory by carrying along the satisfying denotations in an auxiliary data structure associated with each category in and . For an LFG rule r that expands A with the c-structure categories X1 .. Xm, a rule A:t:IR → X1:t1:IR1..Xm:tm:IRm is only added to if there is at least one root expansion of that satisfies Inst(r, (t, t1..tm)) and whose root denotation is shared across all daughters. The root denotations of all such expansions are then associated with A:t:IR. The complexity of this test is proportional to the complexity of the instantiated description of the LFG rule and not of the instantiated description of the entire rule component IR, because the rule components of the daughter categories do not need to be reevaluated.
For further optimizations we can make use of context-free strategies that take top–down and bottom–up information into account at the same time. For instance, we can simulate a left-corner enumeration of the search space, considering categories that are reachable from a current goal category and match the left corner of a possible rule. As another option, we can precompute a reachability table for the context-free backbone of G and use it as an additional filter on rule construction. In general, almost any of the traditional algorithms for parsing context-free grammars can be reformulated as a strategy for avoiding the creation of useless categories and rules. We can also use enumeration strategies that focus on the characteristics of the input f-structure. A head-driven strategy (cf., e.g., Shieber et al. 1990; van Noord 1993) identifies the lexical heads first, finds the rules that expand to them, and then uses information associated with those heads, such as their grammatical function assignments, to pick other categories to expand.
5. Other Chart-based Approaches
A bottom–up strategy for grammar construction comes closest to the algorithms of previous chart-based generation proposals. There is a correspondence between the edges that are added incrementally to a generation chart and the context-free rules that we add to the grammar. But chart edges in these proposals typically collapse some of the distinctions that we have in our rules and categories, and therefore these algorithms cannot faithfully interpret the full set of grammatical dependencies. For some grammars and inputs they may produce strings that should not belong to the generated language. In an attempt to guarantee termination these algorithms may also include grammar restrictions or processing limits that unduly narrow the set of legitimate results. We will illustrate some correspondences and differences with the modified (-guided) algorithm sketched in the previous section by comparing its first few steps with the operations of Kay's (1996) chart-generation algorithm.
To facilitate the comparison, we have adapted the grammar for one of Kay's examples to an equivalent grammar in the LFG formalism. The LFG grammar is given in (38).
- (38)
- (39)
Taking this f-structure as input, the first step of our bottom–up algorithm is to initialize the agenda with the terminal categories . Those categories are sufficient to complete the right sides of the given lexical rules, and so in the next steps the terminal categories are moved to and rules including those in (40) are constructed. These are the ones that can potentially contribute to the generation of the noun phrase the dog: The instantiated descriptions produced with these terms pass our satisfiability test on .
- (40)
- (41)
In contrast, we draw from the larger term set that includes in addition the collection of path-terms that combine constants with sequences of attributes. Rules (40c,d) make use of the path-term (rootarg1), and it is not unreasonable to extend Kay's approach to create the corresponding edges shown in (42). This would allow his algorithm to be applied to a broader set of grammars and inputs.
- (42)
Continuing with the bottom–up strategy, the categories above will be moved from the agenda to , the rule in (43) will be created from the right-side categories of (40c,d), another NP rule will be created from the constant-instantiated rules in (40a,b), and both new categories will be placed on the agenda.21
- (43)
- (44)
The disadvantage is that an additional condition must be imposed to guarantee that only a finite number of edges will be created so that the chart-construction process does in fact terminate. Kay proposes a use-once restriction that bounds the size of the derivable constituents by the number of predicates in the input. For some grammars and inputs his algorithm will only produce a proper subset of the full set of generable strings. Another disadvantage in comparison to our approach and other approaches in the chart-based family is that Kay's chart edges do not record intermediate generation results in a compact form that allows operations on the generated string set to be carried out in advance of enumerating the individual strings.22
Kay's algorithm is one of a family of chart-based approaches that differ in detail but have similar characteristics at an abstract level. A common thread is that each edge contains a semantic or feature-structure representation aggregated from all of the edges in the subtree that it dominates, and edge creation is filtered by testing whether these representations subsume the generation input. Each algorithm in the family also imposes one or more additional restrictions in an attempt to guarantee termination of the string generation process. Kay appeals to a use-once processing condition, as noted earlier, that ensures termination but may only produce a proper subset of the complete output set.
Shieber's (1988) algorithm and its refinements are closer to our approach in that they do not associate individual terminal strings with the edges of the chart. Each edge contains a semantic or feature-structure representation and a sequence of immediate daughter edges from which that representation can be assembled. The individual substrings consistent with that representation are obtained by a recursive traversal reaching down to the terminal edges. The chart-construction phase of these algorithms (and our grammar construction) will not terminate if the number of distinct edges is not bounded by the size of the input. This may be the case for cyclic inputs, because they have infinitely many distinct unfoldings all of which subsume the input. A separate question, even with a bounded chart, is whether the string-production traversal is guaranteed to terminate with a finite set of strings. A grammar may give rise to infinitely many strings if it has recursive or iterative rules whose feature structures subsume the same portion of the input. Any finite set of output strings for such a grammar and input will necessarily be incomplete.
Shieber suggests that the end-to-end generation process will terminate and produce a finite but complete set of output strings for a restricted class of semantically monotonic grammars. Shieber's condition requires that the semantic representation of every mother phrase is subsumed by the semantic structure of each of its daughter phrases. In LFG terms this condition amounts to the requirement that each daughter is mother-definable (with an annotation of the form (↑ σ) = ↓ for |σ| ≥ 0) and, as a consequence, that strings can be generated only for single-rooted inputs. On deeper analysis, however, we see that this restriction is not sufficient to ensure that the generation process will terminate with a finite output set. It does not by itself preclude grammars that assign cyclic feature structures and therefore the chart-construction process may be unbounded. And with an acyclic input and a finite chart the complete set of output strings may still be unbounded since several daughters in a recursive rule may subsume exactly the same portion of the mother's semantic representation. A formal example of this is the monotonic grammar in (8) that produces the string set {an bn |1 ≤ n}. A stronger restriction on the form of the annotations, namely, that σ is never empty, will guarantee a finite chart and a finite and complete output set, but monotonic grammars in this sense cannot naturally identify the functional or semantic head-daughters that figure prominently in so many linguistic descriptions. It seems that monotonicity is not a particularly helpful restriction and that some other constraint, either on grammars or processing steps, is needed to guarantee an output set containing only a finite number of syntactic variants (cf., e.g., Neumann 1994; Moore 2002).
If we translate Shieber's and other similar algorithms to our framework, we see that their instantiations need only terms involving root and none of the constants in or the terms containing those constants.23 This is because these algorithms are not set up to control subsumption accurately for multi-rooted inputs and grammars with mother-undefinable daughters, and in fact their result set may be incorrect in those cases. As we have demonstrated, maintaining all of the proper discriminations requires the larger term set and a mechanism with the same effect as our appropriateness and compatibility conditions.
Comparing other chart-based generation proposals to our bottom–up strategy for creating a generation grammar has brought out some similarities but also highlighted some important differences. Chart edges contain information that summarizes the syntactic and semantic contribution of their subtrees and also allows for the correlated terminal strings to be read out by a straightforward traversal. These algorithms cannot attain correctness, completeness, and termination without imposing limits on the kinds of grammatical dependencies that the generator can faithfully interpret, the range of structures that can be provided as input, or the size and number of output strings that can be produced. Our approach operates correctly on a larger class of grammars and inputs because we have more instantiating terms and therefore are able to maintain appropriate discriminations without special restrictions. The resulting grammar gives a finite encoding of the complete set of generated outputs in a well-understood formal system. These can be enumerated on demand in our separate context-free generation phase.
6. Cycles
We have established the context-free result only for acyclic f-structures; the result does not hold for cyclic inputs. This is because the f-structures that correspond to subderivations of a derivation of a cyclic structure are not necessarily bounded by the size of the input. So we might need an infinite number of terms in order to reproduce correctly any discrimination made in the f-description for some subderivation of a cyclic input structure. The following example demonstrates that the set of strings that a grammar relates to a particular cyclic input might not be context-free.24
Consider the LFG grammar G = ({S, A, C}, {a, b, c}, S, R) with the annotated rules R given in (45).
- (45)
- (46)
In general our grammar construction will produce correct outputs for the term set drawn from any finite unfolding of a cyclic input structure, but a complete characterization of the output strings would require an infinite term set. We have not yet investigated the formal properties of the languages that are related to cyclic structures. It is an open research question whether a more expressive system (e.g., indexed grammars or other forms of controlled grammars) can give a finite characterization of the complete string set and whether our context-free grammar construction can be extended to produce such a formal encoding.
7. Other Descriptive Devices
We have shown that the context-free grammar of Definition 16 produces the strings in GenG(F) for an LFG grammar G that characterizes f-structures by means of equality and function application, the most primitive descriptive devices of the LFG formalism. In this section we extend the grammar-construction procedure so that it produces context-free generation grammars that simulate the other formal devices that were originally proposed by Kaplan and Bresnan (1982).25
Completeness and Coherence. The result holds trivially when we also take into account LFG's devices for enforcing the subcategorization requirements of individual predicates, the completeness and coherence conditions. Both conditions are concerned with the semantic-form predicate values that consist of a predicate and a list of governable grammatical functions, as for example, ‘fall〈(subj)〉’ with the list 〈(subj)〉 and ‘john’ with the empty list. An f-structure is complete if each substructure (including the entire structure) that contains a pred also contains all governable grammatical functions its semantic form subcategorizes for. And an f-structure is coherent if all its governable functions are subcategorized by a local semantic form. If an input f-structure F is not complete and coherent, the LFG derivation relation ΔG does not associate it with any strings, and the set GenG(F) is empty. Thus, when we determine by inspection that an input f-structure fails to satisfy these conditions, we maintain the context-free result by assigning it a trivial grammar that generates the empty context-free language.
C-Structure Regular Predicates and Disjunctive Functional Constraints. The construction in Section 3.3 produces context-free generation grammars for LFG grammars whose c-structure rules are of an elementary form: Their right-hand sides consist of concatenated sequences of annotated categories, and the equations in the annotation sets are interpreted as simple conjunctions of f-structure requirements. The full LFG notation is more expressive, allowing functional requirements to be stated as arbitrary Boolean combinations of basic assertions. It also allows the right-hand sides of c-structure rules to denote arbitrary regular languages over annotated categories. Rules with the richer notation can be normalized to rules of the necessary elementary form by simple transformations. First, in the regular right-side of each rule every category X with a Boolean combination of primitive annotations is replaced by a disjunction of X's each associated with one of the alternatives of the disjunctive normal form of the original annotation. Then the augmented regular right-sides are converted to a collection of right-linear rewriting rules by systematically introducing new nonterminals and their expansions, as described by Chomsky (1959) (see also Hopcroft and Ullman 1979). The new nonterminals are annotated with ↑ = ↓ equations as needed to ensure that f-structure requirements are properly maintained. The result of these transformations is a set of productions all of which are in conventional context-free format and have no internal disjunctions and which together define the same string/f-structure mapping as a grammar encoded in the original, linguistically more expressive, notation.
Constraining Statements and Negation. The statements in an LFG f-description are divided into two classes: defining and constraining statements. The constraining statements are evaluated once all defining statements have been processed and a minimal model (of the defining statements) has been constructed. The constraining devices introduced by Kaplan and Bresnan (1982) are constraining equations and inequalities, and existential and negative existential constraints. If a constraining statement is contained in an f-description FD, it is evaluated against a minimal model M of the defining statements of FD in the obvious way: M ⊧ t = ct′ iff M ⊧ t = t′ (constraining equation), M ⊧ t iff ∃t′ M ⊧ t = t′ (existential constraint), (negation of a constraining or defining statement).
We can extend our grammar construction to descriptions with constraining statements by adjusting the definition of IRDF. We modify condition (ii) of Definition 15 so that M|Σ ≅ F for a minimal model M of just the defining statements of Inst(IR) and additionally require M ⊧ γ for all constraints γ of Inst(IR). Then a context-free grammar based on this revised definition will properly reflect the defining/constraining distinction.
The proof of this depends on one further technicality, however. Recall that the constructions that we used in the proof of our main theorem yield in both proof directions FD[ψ] = Inst(IRroot). As a consequence, the constraining statements in Inst(IRroot) are exactly the ones that result from those in FD by substitution with ψ. Suppose that M and Mroot are minimal models of the defining part of FD and Inst(IRroot), respectively. In order to establish also that M satisfies all constraints in FD iff Mroot satisfies the ones contained in Inst(IRroot), it is sufficient to show that M ⊧ t = t′ iff Mroot ⊧ t[ψ] = t′[ψ] holds for all denoting terms. This follows (with M′ as Mroot) from the isomorphic mapping of term denotations provided by Lemma 2', a slightly stronger version of Lemma 2.
Lemma 2'
Let c and ρ be a derivation with f-description FD and f-structure F in G. If ψ is a reducing substitution for c and ρ andandare minimal models of the defining parts of FD and FD[ψ], respectively, then there is an isomorphism h between M|Σ and M′|Σ such that h(I(t)) = I′(t[ψ]) for each interpreted term t or t[ψ].26
Membership Statements. Membership statements are formulas of the form t′ ∈ t. Membership in LFG is interpreted just as a binary relation between functional elements, and a model satisfies a membership statement t′ ∈ t iff the membership relation holds between the denotation of t′ and the denotation of t. Membership statements may introduce daughters that are undefinable in terms of their mother and therefore may be instantiated by constants as we illustrated earlier in our treatment of the (↑ adj) = (↓ ele) annotation. Then, if we expand the isomorphism-based determination of the equivalence of feature structures and feature descriptions in the usual way to sets and set descriptions, membership statements can be handled by our original construction without further modification.27
Semantic Form Instantiation. As described earlier, semantic forms are the single-quoted values of pred attributes in terms of which the completeness and coherence conditions are defined. They are also instantiated, in the sense that for each occurrence of a semantic form in a derivation a new and distinct indexed form is chosen. Because of this special property, semantic forms occurring in annotated rules may be regarded as metavariables that are substituted by the instantiation procedure similar to the familiar ↑ and ↓ symbols. The distinguishing indices on semantic forms are usually only displayed in a graphical representation of an f-structure if this is necessary for clarity, but distinctively indexed semantic forms are always available for appropriately instantiating the LFG rules, just like the other constants that we draw from the input structure. We can extend the mechanism for controlling the correct instantiation of undefinable daughters to ensure that the semantic forms of all simulated derivations are correctly instantiated. As part of an appropriate instantiation of an LFG rule we also substitute for the prototypical semantic forms in the rule distinct indexed forms, drawn from F, and we expand the compatibility condition to this larger set of instantiations.
8. Consequences and Observations
We have shown that a given LFG grammar can be specialized to a context-free grammar that characterizes all and only the strings that correspond to a given (acyclic) f-structure. We can now understand different aspects of generation as pertaining either to the way the specialized grammar GF is constructed or to well-known properties of context-free grammars and context-free generation.
It follows as an immediate corollary, for example, that it is decidable whether the set GenG(F) is empty, contains a finite number of strings, or contains an infinite number of strings. This can be determined by inspecting GF with standard context-free tools, once it has been constructed. If the language is infinite, we can make use of the context-free pumping lemma to identify a finite number of short strings from which all other strings can be produced by repetition of subderivations. Wedekind (1995) first established the decidability of LFG generation and proved a pumping lemma for the generated string set; our theorem provides alternative and very direct proofs of these previously known results.
We also have an explanation for another observation of Wedekind (1995). Kaplan and Bresnan (1982) showed that the Nonbranching Dominance Condition (sometimes called Off-line Parsability) is a sufficient condition to guarantee decidability of the membership problem. Wedekind noted, however, that this condition is not necessary to determine whether a given f-structure corresponds to any strings. We now see more clearly why this is the case: If there is a context-free derivation for a given string that involves a nonbranching dominance cycle, we know that there is another derivation for that same string that has no such cycle. Thus, the generated language is the same whether or not derivations with nonbranching dominance cycles are allowed.
There are practical consequences to the two phases of LFG generation. The grammar GF can be provided to a client as a finite representation of the set of perhaps infinitely many strings that correspond to the given f-structure, and the client can then control the process of enumerating individual strings. The client may choose to produce the shortest ones just by avoiding recursive category expansions. Or the client may apply an n-gram model (Langkilde 2000), a stochastic context-free grammar model (Cahill and van Genabith 2006) or a more sophisticated statistical language model trained on a collection of derivations to identify the most probable derivation and thus the presumably most fluent sentence from the set of possibilities (Velldal and Oepen 2006; de Kok, Plank, and van Noord 2011; Zarrieß, Cahill, and Kuhn 2011).
We have assumed in our construction that terminals are morphologically unanalyzed, full-form words. A more modular arrangement is to factor morphological generalizations into a separate formal specification with less expressive power than LFG rules can provide, namely, a regular relation (Karttunen, Kaplan, and Zaenen 1992; Kaplan and Kay 1994). The analysis of a sentence then consists of mapping the string of words into a string of morphemes to which the LFG grammar is then applied. The full relation between strings of words and associated f-structures is then the composition of the regular morphology with an LFG language over morpheme strings. To generate with such a combined system, we can produce the context-free morpheme strings corresponding to the input f-structure, and then pass those results through the morphology. Because the class of context-free languages is closed under composition with regular relations and regular relations are closed under inversion, the resulting set of word strings will remain context-free.
Our proof also depends on the assumption that the input F is fully specified so that the set of possible instantiations is finite. Dymetman (1991), van Noord (1993), and Wedekind (1999, 2006) have shown that it is in general undecidable whether or not there are any strings associated with a structure that is an arbitrary extension of the f-structure provided as the input. Indeed, our proof of context-freeness does not go through if we allow new elements to be hypothesized arbitrarily, beyond the ones that appear in F; if this is permitted, we cannot establish a finite bound on the number of possible categories. This is unfortunate, because there may be interesting practical situations in which it is convenient to leave unspecified the value of a particular feature. If we know in advance that there can be only a finite number of possible values for an underspecified feature, however, the context-free result can still be established. We create from F a set of alternative structures {F1, ..,Fn} by filling in all possible values of the unspecified features, and for each of them we produce the corresponding context-free grammar. Because a finite union of context-free languages is context-free, the set of strings generated from any of these structures must again remain in that class. Of course, this is not a particularly efficient technique: It introduces and propagates features that the grammar may never actually interrogate, and it needlessly repeats the construction of common subgrammars that do not make reference to the alternative feature specifications. The amount of computation may be reduced by adapting methods from the parsing literature that operate on conjunctive equivalents of disjunctive feature constraints (e.g., Karttunen 1984; Maxwell and Kaplan 1991).
Our theorem helps us to understand better the problem of ambiguity-preserving generation. We showed previously that the problem is undecidable in the general case (Wedekind and Kaplan 1996). But our generation result does enable us to make that decision under certain recognizable circumstances, namely, if the intersection of the sentence sets assigned to the different f-structures is computable. This is true if the sentences belong to some formally restricted subsets of the context-free languages, for example, finite sets or regular languages; this is the unstated presupposition of Knight and Langkilde's (2000) parse-forest technique. For a set of f-structures {F1,..,Fn} we construct the context-free grammars and inspect them with standard context-free tools to determine whether belongs to an intersectable subclass (i = 1,..,n). If each of them meets this condition, we can compute the intersection to find any sentences that are derived ambiguously with f-structures F1,..,Fn.
We have shown in this article that the context-free property also holds for other descriptive devices as originally proposed by Kaplan and Bresnan (1982). In Wedekind and Kaplan (forthcoming) we broaden the grammar-construction procedure so that it produces context-free generation grammars that simulate the more sophisticated mechanisms that were introduced and adopted into later versions of the LFG formalism. Among these are devices for the f-structure characterization of long-distance dependencies and coordination: functional uncertainty (Kaplan and Maxwell 1988a; Kaplan and Zaenen 1989), set distribution for coordination, and the interaction of uncertainty and set distribution (Kaplan and Maxwell 1988b). We also extend to devices whose evaluation depends on properties of the c-structure to f-structure correspondence, namely, functional categories and extended heads (Zaenen and Kaplan 1995; Kaplan and Maxwell 1996) and functional precedence (Bresnan 1995; Zaenen and Kaplan 1995). Of course, the context-free result trivially holds for purely abbreviatory notations such as templates, lexical rules, and complex categories (Butt et al. 1996; Kaplan and Maxwell 1996; Dalrymple, Kaplan, and Holloway King 2004; Crouch et al. 2008); these clearly help in expressing linguistic generalizations but can be formally treated in the obvious way by translating their occurrences into the more basic descriptions that they abbreviate. In contrast, the restriction operator (Kaplan and Wedekind 1993) requires more careful consideration. Restriction can cause the functional information associated with intermediate c-structure nodes not to be included in the f-structures of higher nodes. This is formally quite tractable if the restricted information is provided to the generator as a separately rooted f-structure. Otherwise, the f-structure input is essentially underspecified, and thus, as discussed earlier, a context-free generation grammar can be produced just in case restriction can eliminate only a finite amount of information (see also Wedekind 2006).
A final comment concerns the generation problem for other high-order grammatical formalisms. The PATR formalism also augments a context-free backbone with a set of feature-structure constraints, but it differs from LFG in that its metavariables allow constraints on one daughter to refer directly to sister feature structures that may not be mother-definable. It is relatively straightforward to extend our lemmas and theorem so that they apply to a more general notion of definability that encompasses sisters as well as mothers. We can thus establish the context-free result for a broader family of formalisms that share the property of being endowed with a context-free base. On the other hand, it is not clear whether the string set corresponding to an underlying Head-driven Phrase Structure Grammar (HPSG) feature structure is context-free. HPSG (Pollard and Sag 1994) does not make direct use of a context-free skeleton, and operations other than concatenation may be used to assemble a collection of substrings into an entire sentence. We cannot extend our proof to HPSG unless the effect of these mechanisms can be reduced to an equivalent characterization with a context-free base. Grammars written for the ALE system's logic of typed feature structures (Carpenter and Penn 1994), however, do have a context-free component and therefore are amenable to the treatment we have outlined.
In sum, this article offers a new way to conceptualize the generation problem for LFG and other higher-order grammatical formalisms with context-free backbones. Distinguishing the grammar-specialization phase from a string-enumeration phase provides a mathematical framework for understanding the formal properties of the generated string sets. It also provides a framework for analyzing and understanding the computational behavior of existing approaches to generation. Existing algorithms operate properly on restricted grammars and inputs and thus only approximate a complete solution to the problem. They typically implement particular techniques for optimizing the size of the search space and bounding the amount of computation required by the generation process. Our formulation can allow a larger and perhaps more attractive set of candidates to be safely considered, and it also makes available a collection of familiar tools that may suggest new ways of improving algorithmic performance.
From a more general perspective, there has been no deep tradition for the formal analysis of higher-order generation akin to the richness of our mathematical and computational understanding of parsing. The approach outlined in this article, we hope, will serve as a major step in redressing that imbalance.
Acknowledgements
We are indebted to John Maxwell for many fruitful and insightful discussions of the LFG generation problem. Hadar Shemtov, Martin Kay, and Paula Newman have also offered criticisms and suggestions that have helped to clarify many of the mathematical and computational issues. We also thank the anonymous reviewers for their valuable comments on an earlier draft. This work was carried out while R. M. K. was at the Palo Alto Research Center and at Microsoft Corporation.
Notes
Dymetman (1997) extends this characterization to unification grammars.
The word “derivation” here and in the following is used only to characterize the notion of well-formedness in LFG and is not meant to undermine the contrast between LFG and conventional transformational approaches to syntax.
For some sentences and some grammars (e.g., those involving functional uncertainty) there are f-descriptions with several non-isomorphic minimal models. As we discuss in a subsequent article (Wedekind and Kaplan forthcoming), these also lie within the bounds of our context-free construction.
Note that the interpretation of the node constants that we restrict out of the minimal models can be seen to represent the structural correspondence function that maps individual nodes of the c-structure tree into elements of the f-structure (cf. Kaplan 1995).
The German grammar was developed as part of the Parallel Grammar project (ParGram), a research and development consortium that has produced large-scale LFG grammars for several languages (Butt et al. 1996, 2002). These grammars are developed on the XLE system, a high-performance platform for LFG parsing and generation. More information on the ParGram project and XLE can be found at: http://pargram.b.uib.no/.
Note that this definition permits equations containing terms of the form (↓ σ), with σ nonempty, and that it does not require ↑ and ↓ to occur in the annotation of each category. It thus allows for grammars that assign to sentences multiply rooted f-structures or f-structures consisting of totally unconnected parts. We take the “f-structure of a sentence” to be the collection of all elements that correspond to c-structure nodes, even those that are not accessible from the root node's f-structure.
Usually, conditions (vi) and (vii) are taken to be additional nonlogical axiom schemata of some traditional equational logic expressive enough to axiomatize LFG's underlying feature logic. Because we are not primarily interested in completely axiomatizing LFG's formal devices within some appropriate meta-theory, we enforce the special properties of LFG's atomic feature values by definition and assume that standard first-order logic with equality is used to determine satisfiability.
It may be the case that such a left-branching proof must start with a reflexive equation t = t that is not in FD but can be inferred by partial reflexivity from an equation (t σ) = t″ in FD. Partial reflexivity is the restriction of reflexivity to well-defined (object denoting) terms. It is a sound inference rule for the theory of partial functions for which full reflexivity does not hold.
From a computational point of view a constant aa can be regarded as the address of a or a pointer to a.
According to our formalization, attribute symbols are interpreted by unary partial functions over the universe and atomic value symbols by elements of the universe. Thus the graph indicates, for example, that the interpretation function assigns to the attribute symbol pred the (unary) partial function {(a, b), (e, h), (f, i), (g, j)} and to the attribute symbol ele the partial function {(f, d), (g, d)}. Furthermore, it interprets the atomic value symbol ‘fall〈(subj)〉’ as denoting b and past as denoting c.
Note that we cannot establish the converse of (i). This is because a daughter node constant that is not interpreted in a minimal model of the instantiated description of a rule might occur in a statement introduced by a rule expanding that daughter. In a minimal model corresponding to a larger derivational context such a daughter constant might thus belong to the interpreted symbols.
Of course, the constant ψi−1 cannot occur as a proper subterm of any other ψi − 1 value. If ψi−1 were to map a node n′ to (aa σ) then n′ must be m-definable and there must be a node dominating n′ that is not m-definable and mapped to aa. Because aa ≠ root, aa must instantiate a daughter of another rule, contradicting compatibility.
Note that the particular substitution that we construct in the proof of Lemma 3 reduces FD to an equivalent description that is satisfied in an expansion of by an interpretation for root in .
The set of terminals TF is constructed from the full term set instead of just ⊥ to allow for the possibility of ↓ appearing in lexical entries (e.g., Zaenen and Kaplan 1995).
Cf. Hopcroft and Ullman (1979).
The rule component IR is a refinement of the third component of the categories defined in Kaplan and Wedekind (2000). The categories there were distinguished by Inst(IR), the descriptions produced by collecting the instantiated annotations from our third-component rules, and thus give a more compact representation whenever different subderivations provide the same instantiated description. As we demonstrated, such a simpler representation is sufficient to control the generation process for grammars with a conventional set of descriptive devices. Instantiated descriptions, however, do not provide enough information for grammars with devices whose evaluation requires the c-structure to be taken into account, as, for example, functional precedence (Bresnan 1995; Zaenen and Kaplan 1995). We show in Wedekind and Kaplan (forthcoming) that these devices can be modeled with our more elaborate rule representation.
Note that the licensing LFG rule might not be uniquely determined if the derivation in GF simulates recursions.
Kay provides a flat, unordered collection of separate propositions as input to the generation process, but the difference between a flat and hierarchical arrangement is not material to our discussion. We have translated his constants s, d, c into the elements of our f-structure input, and we have mapped his propositions (dog(d), arg1(s, d)..) into equivalent attribute–value relationships. By the same token, because here we are focusing on the organization of data structures, we note without further comment that his active-passive computational schema is but one way of specializing our general bottom–up algorithm.
Kay's instantiated semantics corresponds more directly to the third components of the categories of the less sophisticated grammar construction of Kaplan and Wedekind (2000). These instantiated descriptions collapse some of the distinctions of our third-component rules that are not needed for the limited range of dependencies that Kay is considering.
The NP based on the rules (40a,b) will not survive into a larger derivation in our framework. This is because all NP daughters are mother-definable in this grammar, and therefore the ad instantiation is not appropriate for the arg daughter of S.
Maxwell (2006) describes a variant of Kay's algorithm that provides a more compact representation for the generated string sets and also deals efficiently with disjunctive input structures.
Moore (2002) observed that the basic properties of the algorithm do not change if semantically vacuous constituents are allowed. In this case the translation would require the additional term ⊥ to reduce nodes that are not interpreted in a model of the f-description.
Because LFG theory has evolved away from the original c-structural encoding of long-distance dependencies, we will not consider it here. In Wedekind and Kaplan (forthcoming) we describe the construction for grammars that use functional uncertainty, the device that superseded the initial mechanism for characterizing long-distance dependencies.
The proof requires an elaboration of the argument used in the proof of Lemma 2. Following the inductive construction of that proof, it is easy to see that I(t) = Ii(t[ψi]) holds for all terms t and t[ψi] that are interpreted in or . Because there must be an isomorphism h between and any other minimal model of the defining part of FD[ψ], and thus h(I(t)) = I′(t[ψ]) for each interpreted term t or t[ψ].
Rounds (1988) proposes a bisimulation-based characterization of sets and set membership. This would require a more sophisticated analysis, but it is more of mathematical than linguistic interest.
References
Author notes
Center for Language Technology, University of Copenhagen, Njalsgade 140, 2300 Copenhagen S, Denmark. E-mail: [email protected].
Nuance Communications, Inc., 1198 East Arques Avenue, Sunnyvale, CA 94085, USA. E-mail: [email protected].