## Abstract

The universal generation problem for LFG grammars is the problem of determining whether a given grammar derives any terminal string with a given f-structure. It is known that this problem is decidable for acyclic f-structures. In this brief note, we show that for those f-structures the problem is nonetheless intractable. This holds even for grammars that are off-line parsable.

The universal generation problem for LFG grammars (Kaplan and Bresnan 1982) is the problem of determining for an arbitrary grammar *G* and an arbitrary f-structure *F* whether *G* derives any terminal string with *F*. This has been shown to be undecidable even for grammars that are off-line parsable (Wedekind 2014). If *F* is acyclic, however, Wedekind and Kaplan (2012) have shown that the problem is decidable. They prove that the set of strings that an LFG grammar relates to an acyclic f-structure can be described by a context-free grammar. Decidability of the problem then follows because the emptiness problem is decidable for context-free languages. To date, however, the complexity status of this problem has been unknown.

In this brief note, we show the intractability of LFG’s generation problem from acyclic f-structures by polynomial-time reduction from the 3-SAT problem, a problem that is known to be NP-complete. The 3-SAT problem is the problem of determining the satisfiability of a Boolean formula in conjunctive normal form where each of the conjoined clauses is a disjunction of three literals. That is, each formula is a conjunction of the form *C*_{1} ∧ .. ∧ *C*_{m}, each clause *C*_{j} is a disjunction of the form *l*_{j1} ∨ *l*_{j2} ∨ *l*_{j3}, and each literal *l*_{jk}, *k* = 1, .., 3, is a propositional variable *p*_{i} or a negated variable ¬*p*_{i}. Without loss of generality, we assume in the following that every literal occurs only once in a clause and a clause does not contain both, a variable and its complement.

To state the generation problem more formally, recall that an LFG grammar *G* defines a binary derivation relation Δ_{G} between terminal strings and f-structures, as given in (1).

- (1)
Δ

_{G}(*s*,*F*) if and only if*G*derives terminal string*s*with f-structure*F*

*G*and an arbitrary acyclic f-structure

*F*whether {

*s*∣ Δ

_{G}(

*s*,

*F*)} is empty or not.

## Reductions to Problem-specific Grammars

For any instance *ψ* = *C*_{1} ∧ .. ∧ *C*_{m} of the 3-SAT problem over variables *p*_{1}, .., *p*_{n}, we construct an LFG grammar *G*_{ψ} and an acyclic f-structure *F*_{ψ} such that there is a string *s* and (*s*, *F*_{ψ}) ∈ Δ_{Gψ} if and only if *ψ* is satisfiable.

The grammar *G*_{ψ} includes the start rule

- (2)

*p*

_{i}two terminal rules of the form (3a, b).

- (3)

*p*

_{i}the clauses

*C*

_{j}that are true if

*p*

_{i}is true (by the annotations (↑c

_{j}) = true in (3a)) and the clauses that are true if ¬

*p*

_{i}is true (3b). Thus, there must be a truth assignment for the variables that makes all clauses

*C*

_{j}of

*C*

_{1}∧ .. ∧

*C*

_{m}=

*ψ*true if and only if

*G*

_{ψ}derives terminal string $

^{n}with the f-structure

- (4)

*C*

_{j}the c

_{j}attributes have value true. Hence,

*ψ*is satisfiable if and only if

*G*

_{ψ}derives a terminal string with the f-structure

*F*

_{ψ}in (4).

*G*_{ψ} has 2*n* + 1 rules, a single start rule (2) of length *n* with *n* annotations and two rules (3) of constant length for each of the *n* propositional variables with a total of 3*m* annotations. The input structure *F*_{ψ} has size *m* (measured in the number of attribute-value pairs). The rules and the input can be constructed just by scanning *ψ*, which is of length 3*m*, from left to right. However, in each step, the list of rules already built is scanned to check whether a new annotation has to be added to an existing P_{i} rule or a new P_{i} rule has to be created. During the same scan a new daughter is added to the start rule if needed. Thus, the total time needed to construct the grammar rules and the input is at most of order *m* ⋅ *n*, a polynomial in the size of the original 3-SAT problem.

By construction, there are 2^{n} annotated c-structure derivations in *G*_{ψ}, for any 3-SAT instance *ψ* over *n* propositional variables, and all those derivations derive the string $^{n}. Thus, in the worst case, 2^{n} annotated c-structure derivations may have to be examined to determine whether (*s*, *F*_{ψ}) ∈ Δ_{Gψ}, for any string *s*.^{1}

As a simple illustration consider the satisfiable formula in (5).

- (5)
*ψ*=*C*_{1}∧*C*_{2}∧*C*_{3}= (*p*_{1}∨*p*_{2}∨*p*_{3}) ∧ (¬*p*_{1}∨ ¬*p*_{2}∨*p*_{3}) ∧ (¬*p*_{1}∨*p*_{2}∨ ¬*p*_{3})

- (6)
- (7)

In order to guarantee decidability of the recognition problem, Kaplan and Bresnan (1982) introduced a constraint, later called the Off-line Parsability Constraint, that proscribes empty productions and nonbranching dominance chains and thus bounds the number and size of the c-structures of a string by a function of the length of that string. Because the grammars *G*_{ψ} do not contain empty productions and do not produce nonbranching dominance chains, the acyclic generation problem is intractable even for off-line parsable LFGs.

Note also that a transposition of this reduction can be used to show the intractability of the recognition problem for off-line parsable LFGs. This transposition is intrinsically simpler than Berwick’s original reduction (Berwick 1982). The grammar $G\psi \u2032$ includes the start rule

- (8)

*l*

_{jk}of

*l*

_{j1}∨

*l*

_{j2}∨

*l*

_{j3}=

*C*

_{j}a terminal rule of the form (9).

- (9)

^{m}annotated c-structure derivations for a string of length

*m*that may have to be inspected to determine whether there is an f-structure

*F*with ($

^{m},

*F*) ∈ Δ

_{$G\psi \u2032$}.

$G\psi \u2032$ has 3*m* + 1 rules: One rule of constant size for each literal in each disjunctive clause *C*_{j} (i.e., in total 3*m* rules) and a single start rule of length *m* with *m* annotations. Because the rules for the literals of each clause are independent of the rules for the literals in other clauses, rescanning of already constructed rules is not required. Thus, the total time needed to construct the grammar rules is at most of order *m*.

We have demonstrated that LFG’s acyclic generation and recognition problems can be reduced from the 3-SAT problem in polynomial time. Because the satisfiability of the f-description of a given annotated c-structure can be tested quickly, LFG’s recognition problem is in NP and hence NP-complete. For the acyclic generation problem, however, it is not yet clear whether it belongs to NP, because the problem of deciding whether the input f-structure and the f-structure assigned to a given derivation are structurally identical is an instance of the isomorphism problem for labeled directed acyclic graphs. Because it is not yet known whether that problem can be solved in polynomial time (Basin 1994), with current knowledge we can only establish that the acyclic generation problem is NP-hard.

In these reductions, the size parameters *n* and *m* of the 3-SAT problem instances are reflected in certain size parameters of the corresponding LFG grammars, namely, the length of the rules and the number of attributes. These technical demonstrations reveal the expressive power of the basic LFG formalism, but they do not immediately carry over to the way that the formalism is deployed in linguistic practice. Grammars of natural language are not revised and specialized for every input that is presented for recognition or generation. Rather, they describe particular natural languages with a fixed number of rules and attributes that are intended to operate correctly on inputs of arbitrary size. We can make our analysis more directly relevant by providing a grammatical framework with a fixed set of rules and attributes that can reduce 3-SAT problems of any size. In this framework the particular problem to be solved is not encoded in the grammar but is presented as the input, either as a string or an f-structure.

## Reductions to Generic Grammars

We first describe the generic reduction for the recognition problem. We encode the specific literals *p*_{i}, ¬*p*_{i} as sequences of *i* $ terminals followed by + or − ($^{i}+, $^{i}−). The literal ¬*p*_{3} is thus represented as the string $$$-.
A clause is represented as the concatenation of the representations of its 3 literals and a whole problem as the concatenation of the representations of its clauses. Hence, the 3-SAT formula (5) is represented as the terminal string $ + $ $ + $ $ $ + $ − $ $ – $ $ $ + $ − $ $ + $ $ $ −. The LFG grammar consists of the 8 rules in (10). The S rules generalize the start rule in (8) to an unlimited number of clauses. As now string encodings of satisfiable 3-SAT instances are to be recognized, the C rules expand to three L daughters for deriving representations of the three literals and they guess, similar to (9) but now through trivial annotations, the true literal in each clause. The L rules derive the string representations of the literals $^{i}+ or $^{i}− and assign to them attribute-chain encodings of the form p^{i}valtrue or p^{i}valfalse.

- (10)

*ψ*with

*m*clauses can be constructed by scanning

*ψ*from left to right. Thus the total time needed to construct the input string is at most of order

*m*.

The generic reduction for the acyclic generation problem is more involved. For convenience, we use the traditional parenthetic notation for optional annotated categories and optional annotations and {..|..} for disjunction.

We define a generic grammar *G* and construct for any 3-SAT problem instance *ψ* an f-structure *F*_{ψ} such that *G* derives a terminal string with *F*_{ψ} if and only if *ψ* is satisfiable. *F*_{ψ} contains an encoding of *ψ* and a solution structure, and the units of both structures are linked by edges labeled with the attribute s.

We represent the problem with attribute-chain encodings for the literals and also for the clauses: *p*_{i} is represented by p^{i}pos, ¬*p*_{i} by p^{i}neg, and *C*_{j} by c^{j}. Then, a 3-SAT problem with *n* variables and *m* clauses is encoded in the input f-structure through chains that match the regular expression p^{i} {pos|neg} c^{j}occ+, with *i* ≤ *n*, *j* ≤ *m*, where p^{i}pos c^{j}occ+ records that *p*_{i} occurs in *C*_{j} and p^{i}neg c^{j}occ+ records that ¬*p*_{i} occurs in *C*_{j}. For our 3-SAT example (5), the problem encoding is schematically illustrated in the black substructure of the input f-structure depicted in Figure 2.

The solution structure corresponds to an attribute-chain conversion of the input structure (4) of the problem-specific reduction, that is, it is represented through attribute-value chains c^{j}valtrue, *j* = 1, .., *m*. The s edges link the encoding of every literal to the root of the solution structure and the encoding of every clause to its encoding in the solution structure. For our 3-SAT example (5), these components of the input f-structure are depicted in green (solution structure) and red (s edges).

The rules in (11) derive the chain encodings for an arbitrary number of literals.

- (11)

The C productions in (12) allow it to encode through the optional occ+ annotations for each literal all possible occurrences in an arbitrary number of clauses, and the s reentrancies link every derived clause representation to the representation of that clause in the solution structure.

- (12)

*ψ*a string with an f-structure that contains the encoding of

*ψ*. To see that the rules derive the problem encoding together with the solution structure only if

*ψ*is satisfiable, let us consider the derivations that yield the problem encoding for a particular 3-SAT instance with

*n*variables and

*m*clauses. Because the disjunction of the P rules encodes the possible truth assignments to a variable, the recursive application of the P rule alternatives encodes in 2

^{n}derivations all possible truth assignments for the

*n*variables of the given problem. By construction only C

_{true}but not C rules expand derivations for literals that are assigned true. Also, if a literal occurs in

*C*

_{j}then only those rules add valtrue to the encoding of

*C*

_{j}in the solution structure, to indicate that the variable assignment that makes that literal true also makes

*C*

_{j}true.

Thus, if a clause representation in the solution structure is missing the valtrue information, the truth assignment encoded in that derivation does not make that clause true. Hence, individual derivations match the input if and only if there is at least one variable assignment that satisfies every clause. A problem is unsatisfiable if no derivation matches the input. This can be made particularly clear by the trivial unsatisfiable 3-SAT problem instance (*p*_{1} ∨ *p*_{1} ∨ *p*_{1}) ∧ (¬*p*_{1} ∨ ¬*p*_{1} ∨ ¬*p*_{1}), even though it does not comply with the simplifying assumptions that we made at the beginning. For the encoding of this problem, the LFG grammar provides the derivations depicted in Figure 3.

The generic LFG has 11 rules. The input f-structure, which has maximum depth *n* + *m* + 2, can be constructed by scanning *ψ* (with length 3*m*) from left to right and by creating in each step an occurrence encoding of a literal in a clause. Because the input structure has to be scanned top-down in each step, the total time needed to construct the input is at most of order *m*^{2} + *mn*.^{2}

## Concluding Remarks

This note has introduced new results concerning the complexity of LFG generation (and recognition) for grammars that assign acyclic f-structures to input strings. We observed that reductions to problem-specific grammars where the size parameters of the 3-SAT problem instances are reflected in grammar-size parameters are not particularly relevant to the linguistic enterprise. More relevant are the generic reductions where the LFG grammar is kept fixed across all possible 3-SAT problem instances. These show that computational complexity in the worst case can grow exponentially as a function of the size of the input f-structure or string, and that grammar or derivation restrictions of some sort must be imposed for tractability of LFG generation and recognition.

Wedekind and Kaplan (2020) introduced a subclass of LFG grammars with particular restrictions that exclude our generic reduction grammars but ensure that at least the recognition problem is tractable for inputs of arbitrary length. Wedekind and Kaplan further argue that grammars that meet the conditions of this *k*-bounded subclass are still expressive enough for natural language description (see also Kaplan and Wedekind 2019). However, it is at this point still an open question whether generation is polynomial for arbitrary grammars in this linguistically plausible subclass or whether polynomial generation can be established only for grammars within a yet more restricted proper subclass of the *k*-bounded class.

## Acknowledgments

The authors would like to thank the three anonymous reviewers for their helpful suggestions and comments on earlier drafts of this squib.

## Notes

On this analysis the number of derivations depends on the number of variables but not on the number of clauses *m*. Tovey (1982) relates these parameters by showing that 3-SAT is NP-complete for instances where each variable or its complement appears in at most four clauses. Thus, 3*m* ≤ 4*n*, and hence *n* ≥ $34$*m*. This establishes that the number of annotated c-structure derivations that have to be inspected in order to solve the acyclic generation problem can also be exponential in (a fraction of) the number of clauses.

Note that the generic reduction for generation, as well as the problem-specific reductions, also work for general SAT problem instances, and that the 3-literal rule for the recognition problem can easily be extended so that it solves such problem instances too.